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Effectiveness of an Interactive Map Display in Tutoring 
Geography 


Allan Collins, Marilyn Jager Adams, and Richard W. Pew 


Bolt Beranek and Newman Inc. 


The purpose of this study was to evaluate the teaching effectiveness of differ- 
ent aspects of the SCHOLAR computer-assisted instruction system. The exper- 
iment compared how well students learn using SCHOLAR with (a) the interac- 
tive map display of Map-SCHOLAR, (b) a static labeled map, and (c) an unla- 
beled map. The results of the experiment showed that the students learned 
significantly more with the interactive map display than with either the la- 
beled map or the unlabeled map. A new method called backtrace analysis was 
used to assess the effectiveness of specific aspects of the tutoring strategy and 
the map system used in the experiment. 


In developing SCHOLAR, Carbonell 
(1970; Carbonell & Collins, 1973) took a first 
step toward a computer-assisted instruction 
system that is capable of conducting general 
tutorial dialogues with students. In SCHOL- 
AR, knowledge is not stored as text but ina 
precisely structured semantic network of 
interrelated facts and concepts (Collins & 
Quillian, 1972b; Quillian, 1968). Every con- 
cept used to describe a given concept can 
itself be described elsewhere in the network. 
Thus, in a nontrivial sense, the program can 
“understand” the concepts it uses. SCHOLAR 
also has different subroutines that use the 
structure of the network to formulate ques- 
tions for the student and evaluate the re- 
sponses, answer the student’s questions, 
make inferences and computations, select 
new topics for discussion, and so on. The 
attempt is to structure information like 
human knowledge, so that the program can 
use its knowledge as flexibly as a human 
tutor does. 


This research was sponsored by the Personnel and 
Training Research programs, Psychological Sciences 
Division, Office of Naval Research, and the Advanced 
Research Projects Agency, Department of Defense, 
under Contract N00014-76-C-0083, Contract Authority 
Identification Number NR 154-379. 

We would like to thank Nelleke Aiello, Susan M. 
Graesser, and Barbara N. Freeman for programming the 
system and carrying out the experiments and data 
analyses described. 

Requests for reprints should be sent to Allan Collins, 
Bolt Beranek and Newman Inc., 50 Moulton Street, 
Cambridge, Massachusetts 02138. 
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Because of its inherent flexibility, the 
system can be extended in a variety of ways. 
For example, SCHOLAR has been modified 
to teach different kinds of knowledge (Col- 
lins & Grignetti, Note 1) and to use different 
teaching strategies (Collins, Warnock, & 
Passafiume, 1975). The educational question 
is which of the possible ways of varying 
SCHOLAR increase its teaching effectiveness. 
This question is being investigated by run- 
ning systematic experimental evaluations of 
different system configurations and different 
teaching strategies (Collins & Adams, 
1977). 

The experiment reported here was de- 
signed to test the utility of the map capa- 
bility recently added to the original 
SCHOLAR program for teaching geography. 
The Map-SCHOLAR system was developed 
to integrate the tutoring of graphic infor- 
mation with verbal information. In view of 
the evidence that pictorial information can 
be remembered more easily than verbal in- 
formation (cf. Bower, 1972; Paivio, 1971), we 
expected this capability to increase SCHOL- 
AR’s teaching effectiveness. 

Map-SCHOLAR can discuss with the stu- 
dent different maps that change dynamically 
according to the context of the discussion. To 
do this, a graphic structure was created that 
parallels the structure in the semantic net- 
work. The elements in the map display can 
be referred to either by their names, or by 
pointing to them, or both. Map-SCHOLAR 
both asks and answers map-related ques- 
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tions and provides relevant map information 
when the student makes a mistake. In short, 
Map-SCHOLAR has all the capabilities of the 
original SCHOLAR with the addition of the 
map capabilities. 

The first three figures illustrate some of 
the variety of interactions possible with 
Map-SCHOLAR. Figure 1 illustrates how 
Map-SCHOLAR asks map-related questions, 
evaluates the student’s answers, and corrects 
any mistakes. First, the dots indicating the 
location of the cities appear unlabeled on the 
map of Brazil and start blinking. Then 
SCHOLAR asks the student to name the 
blinking cities. When the student responds, 
it indicates which answers were correct and 
prints their names on the map. Because 
Lima was given erroneously as an answer 
(see Figure 1), Map-SCHOLAR pointed out 
that Lima is in Peru and showed where Lima 
is on the map. Thus, Map-SCHOLAR gives 
both verbal and visual feedback to help the 
student learn the material. 

Figure 2 shows how context affects the 


Now let's see what you remember about the cities of Brazil. 
Please name the principal cities of Brazil now blinking on the 
display screen. 

Manaos, Sao Paulo, and Lima. 


Manaos and Sao Paulo are correct. Lima is in Peru, not in Brazil. 
The principal cities of Brazil are blinking on the screen. 


Rio de Janeiro is in Southeastern Brazil. It is on Guanabara Bay 
and the Atlantic. 


Brasilia is in Central Brazil. It is far from the ocean. 


Figure 1. Interactions with M: i 
. r ap-SCHOI 
answer evaluation and error SO RERO. CE: 


Please point to the following cities of the Amazon in 
in which they appear below: Manaos, Belem, and 
Don't forget to let me know when you are finishe 
member to type an asterisk * followed by a carriag 


Manaos is correct. You did not point at Belem and 
rectly. The Cities of the Amazon are blinking on the 


Figure 2. Interactions with Map-SCHO! 
effects of context. 


map display. The topic is the Amazon, 
appears in this context on the map of E 
the overall topic. The question asks 
student to point at the cities alo 
Amazon in the order named. The dots f 
three cities first appear unlabeled, a 
after the student has answered do the 
appear. Unlike the first map, this map § 
only the cities on the Amazon and 
relation to the Amazon. There is a 
pansion of detail, so that a city like Bel 
which was not important enough to be 
cluded among the cities of Brazil, is incl 
with the subtopic of the Amazon. 
the map changes dynamically as the cont 
changes and as the student interacts wil 
system, the student's attention is focus 
the relevant information, and questio 
be posed in a visual form not possible v 
static map display. - 
Figure 3 shows the system's poten! 

tutorial interaction. When the student ( 
not know about Lake Maracaibo, Mi 
SCHOLAR showed it on the screen and a 
the related verbal information about 
there. This example also illustrates som 
the ways in which the student can 
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et's begin by talking about the topography of Venezuela. The 
incipal lake of Venezuela is blinking, please name it. 


| don't know. 
The principal lake of Venezuela is blinking on the screen. 


Lake Maracaibo is in Northwestern Venezuela. The principal 
mineral is oil. 


Is this country Colombia? 
Yes, that's Colombia. 
Where is the capital of Venezuela? 


Caracas is blinking. 


CARACAS 


Figure 3. Interactions with Map-SCHOLAR showing 
questions by student. 


|Map-scnoran to clarify or amplify the in- 
formation given (Collins & Warnock, Note 
2). For the first question the student both 
pointed at and named Colombia to ask if it 
is the country near Lake Maracaibo. For the 
second question, the student verbally asked 
where the capital of Venezuela is, perhaps to 
find out how far away it is. SCHOLAR figured 
out semantically that the capital is Caracas 
and then visually showed where Caracas is 
by blinking it. These examples illustrate 
some of the power for tutorial interaction 
that can be obtained by a close integration 
between semantic and visual knowledge. 
To test the utility of the map system for 
teaching, we conducted an experiment in 
which each student learned about a different 
country under one of three conditions: One 
condition used SCHOLAR on the map system; 
the second condition used SCHOLAR on a 
nongraphic terminal, but the student could 
look at a labeled map of the country; the 
third condition was like the second, except 
at the student was given an unlabeled 
' ap. Students’ learning for each of the three 


kinds of training sessions was measured by 
comparing their scores on a pretest to those 
on a posttest given 3 days after the last 
training session. 

A second goal of this experiment was to 
investigate how specific aspects of the tuto- 
rial dialogue affect students’ learning. To 
study this question, we developed a tech- 
nique called backtrace analysis. The tech- 
nique involves marking each piece of infor- 
mation that is discussed, according to the 
kind of exchange involved (e.g., a question 
requiring a pointing response vs. a naming 
response). By comparing these data to the 
student’s answers on the posttest, it is pos- 
sible to identify the kinds of tutorial inter- 
actions that most strongly influence the 
student’s learning. 


Method 


Subjects 


The initial group of subjects included nine high school 
students. The study was replicated with nine university 
students. All subjects were volunteers and were paid for 
their services. 


Design 


There were three experimental conditions: a Map- 
SCHOLAR condition, a labeled map condition, and an 
unlabeled map condition. The Map-SCHOLAR condition 
was run on an Imlac graphic terminal with the screen 
divided between maps and verbal communications as 
shown in the first three figures. The student could input 
questions and answers by a keyboard and an electronic 
pointer (a “mouse”). The labeled and unlabeled map 
conditions were run on a keyboard terminal using a 
nongraphic version of SCHOLAR called “Tutor- 
SCHOLAR” (Collins et al., 1975). The two versions of 
SCHOLAR were identical with respect to both teaching 
strategy and information in the data base, except that 
Map-SCHOLAR handled all location-related questions 
in terms of the map, whereas Tutor-SCHOLAR handled 
them verbally. In the labeled map condition, subjects 
were given a paper map marked with all the places 
(names and locations) included in the Map-SCHOLAR 
data base. In the unlabeled map condition, subjects were 
given copies of the same maps, without the place names. 
For both of these conditions, students were instructed 
not to mark the maps. The pretest, posttest, and the 
final questionnaire were given in paper-and-pencil 
format. 


Procedure 


Each student participated in a preliminary session, 
three tutorial sessions, and a posttest session. The first 
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purpose of the preliminary session was to administer the 
pretest. The pretest measured the student's preexper- 
imental knowledge about the information to be tutored 
and consisted of 20 basic questions about the geography 
of each of the three relevant countries: Argentina, 
Brazil, and Venezuela. A secondary purpose of the 
pretest was to ascertain that no subject was inordinately 
familiar or unfamiliar with any one of these three 
countries, since such inequalities in prior knowledge 
would confound measures of teaching effectiveness. 
After having completed the pretest, the students were 
given a brief introductory lesson on a fourth country, 
Chile, using Map-SCHOLAR. The purpose of this lesson 
was to familiarize the students with the system and its 
capabilities or, more specifically, with the kinds of 
questions they would be asked, the kinds of answers that 
were expected, the kinds of questions they could ask of 
SCHOLAR, the use of the keyboard and the pointer, and 
the methods by which they could correct any input er- 
rors. 

The tutorial phase of the experiment consisted in 
three 2-hour sessions, administered on consecutive days. 
During these sessions, each student learned about one 
country in the Map-SCHOLAR condition, one in the la- 
beled map condition, and one in the unlabeled map 
condition. Each lesson lasted for 1 hour. After the stu- 
dent had received one lesson on each of the three 
countries, the series was repeated. The combinations 
of countries and teaching modes were counterbalanced 
and ordered according to a 3 X 3 confounded factorial 
design (Winer, 1971, p. 646). 

The final session was conducted 3 days after the last 
tutorial session. In this session, the students took the 
posttest and completed a questionnaire on those aspects 
of the lessons that they had found most and least 
helpful. The posttest was divided into three parts. The 
first part consisted of 36 basic questions (including the 
20 that had been on the pretest) about each of the three 
countries. For the second part of the posttest, the stu- 
dents were given a map of each of the three countries 
and were asked to label the geographical features indi- 
cated, The third part of the posttest consisted of 32 
more difficult questions about each of the countries. 


Backtrace Analysis 


To assess the value of specific aspects of the tutorial 
exchange, we developed the technique of backtrace 
analysis. This technique involves marking each entry 
in SCHOLAR’s data base with respect to the way that it 
is treated during a given tutorial session. This infor- 
mation can subsequently be retrieved, enabling us to 
evaluate the effectiveness of SCHOLAR’s various inter- 
active capabilities from the probabilities with which 
they result in correct answers on the posttest. 

K More specifically, each item that was discussed in a 
given session was tagged with information concerning 
uu epo orgen (b) the context, and (c) the 

event in which it arose. For purposes of the 
beris ud the training events were classified 


True-false correct. SCHO) 

L 3 LAR presents a true-false 
qeu that the student answers correctly. SCHOLAR 
indicates that the student is correct and moves on to 


new information. 

True-false error. SCHOLAR presents a true-false 
question, and the student answers incorrectly or pleads 
ignorance. SCHOLAR points out the correct answer and 
goes on. 

Name correct. The student correctly names a geo- 
graphical feature(s) in response to SCHOLAR's request. 
Each answer among a set of answers is tagged individ- 
ually. This category subsumes what and where ques- 
tions as well as fill-in-the-blanks and naming requests 
by SCHOLAR. 

Name error. The student incorrectly names or fails 
to name a geographical feature when questioned by 
SCHOLAR. 

SCHOLAR error correction. If the student completes 
a fill-in question erroneously, SCHOLAR infers the basis 
of the student’s error and then presents new informa- 
tion to distinguish between the student's answer and the 
correct answer. 

SCHOLAR elaboration. If the student misses a 
question, SCHOLAR presents related information at the 
same level of importance (see Figure 3). The related 
material is tagged as an elaboration. 

Student question. Information is introduced as the 
result of a question that the student asks of SCHOL- 
AR. 

In addition to the above, there were several categories 
of training events that occurred only in Map-SCHOLAR. 
SCHOLAR treated these events like fill-ins, but they were 
distinctively marked for purposes of the backtrace 
analysis; 

Label. SCHOLAR asks the student to name those ^ 
features of the map that are blinking. 

Point. SCHOLAR asks the student to point to the 
specified geographical features on the map. 

Label and point. SCHOLAR asks the student to name 
and point to a specified set of geographical features. 


Results and Discussion 


The pretest scores were examined using 
a 3 X 2 (Countries X Groups) repeated 
measures analysis of variance (Winer, 
1971, p. 518). The only significant effect was 
due to groups, as the college students gen- 
erally scored higher than the high school 
students. The number of correctly answered 
questions, out of the possible 20 per pretest, 
ranged from 1 to 11 (Mdn = 4.67) for the 
college students and from 0 to 5 (Mdn = .64) 
for the high school students. Neither the 
main effect of countries, F(2, 32) = 2.62, p > 
-05, nor the interaction between countries 
and groups, F(2, 32) = 1.36, p > .05, ap- 
proached significance. Inasmuch as none of. 
the subjects knew much about any of the 
countries in advance, the difference between 
pretest and posttest scores should provide a 
fair estimate of SCHOLAR's teaching effec- 
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tiveness. Moreover, since the subjects’ prior 
knowledge seemed to be evenly distributed 
across countries, the relative teaching ef- 
fectiveness of the three conditions could be 
estimated through direct comparisons of the 
corresponding pretest/posttest difference 
„Scores, 

The average increase in the number of 
correct responses from the pretest to the 
posttest was analyzed according to a3 X 3 
(Teaching Modes X Countries) confounded 
| factorial design. Neither the effect of 
. countries, F(2, 28) < 1.0, nor the interaction 
between countries and teaching conditions, 
_ F(4, 28) = 2.08, p > .05, was significant, but 
the effect of training condition was strongly 
significant, F(2, 28) = 6.05, p < .01. Ac- 
cording to a Newman-Keuls test (p « .01), 
the Map-SCHOLAR condition resulted in 
significantly higher posttest scores than the 
labeled map condition, which, in turn, re- 
sulted in significantly higher scores than the 
unlabeled map condition. 

Separate analyses of the three parts of the 
posttest indicated that much of the effect of 
teaching modes occurred in the part of the 
test consisting of map-labeling questions, 
F(2, 28) = 14.09, p < .001. However, a pro- 
nounced effect of teaching mode was also 
obtained for the easier nonmap questions in 
the first part of the posttest, F(2, 28) — 5.85, 
p < .01. Although the scores on the more 
difficult questions in the third part of the 
posttest were too variable to yield any sig- 
nificant effects or interactions under anal- 
ysis, the same trend was apparent. In short, 
posttest scores were consistently highest in 
the Map-SCHOLAR condition and lowest in 
the unlabeled map condition. These results 
indicate that the map system not only helped 
students learn the information necessary to 
answer the map questions in Part 2 of the 
posttest but also to answer the verbal ques- 
tions in Parts 1 and 3 of the posttest. 

An important question is whether the 
benefit of the map system extended only to 
verbal information that was explicitly stored 
about locations or whether it also extended 
to nonlocation information, such as the cli- 
mate or terrain of a place. Clearly, one would 
expect the map system to help students learn 
location information, but there are two rea- 
sons why the map system might help stu- 
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Table 1 

Proportion of Correct Posttest Responses on 
Map and Nonmap Questions for Each 
Condition 


Type of Question 


Condition during training Map _ Nonmap 
Unlabeled map 34.0 46.8 
Labeled map 34.9 39.9 
Map-SCHOLAR 46.8 44.2 


dents learn nonlocation information as well. 
First, if map information showing where a 
place like Manaos is located helps the stu- 
dent remember Manaos, then nonmap facts 
about Manaos, such as its climate, may be 
more likely to be remembered. This is be- 
cause the best way to learn something is to 
relate it to information already known 
(Collins & Quillian, 1972a; Norman, 1973). 
Second, if a student sees that Manaos is on 
the Amazon, then Manaos’ climate can be 
related to any prior knowledge about the 
climate of the Amazon (e.g., that the Amazon 
flows through jungle). Thus, even nonmap 
information may be better remembered in 
a visual context. 

This idea was tested with backtrace 
analysis by separating the questions during 
training into map questions and nonmap 
questions, depending on whether the ques- 
tions were posed visually by the map system. 
Then the percentage correct on the posttest 
for the two types of presentation during 
training were plotted (see Table 1). For map 
questions, as expected, students learned 
significantly more with Map-SCHOLAR than 
with either the labeled or unlabeled maps. 
However, for nonmap information there 
were no significant differences, and students 
even did slightly better in the unlabeled map 
condition. Thus, these data suggest that the 
major benefit of the map system is in learn- 
ing information about specific locations. 

Backtrace analysis was also used to in- 
vestigate the effectiveness of repeating 
questions, depending on whether the student 
answers correctly or incorrectly. Figure 4 
shows the percentage of correct responses to 
each item on the posttest as a function of 
how frequently the students were right or 
wrong on that item during training. The in- 
creases in the curves show that the more 
frequently a student answered any item 
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>.<] ERROR 
^ 


AZ ERRORS 


PERCENT CORRECT ON POST TEST 


o 1 2 3 24 
NUMBER OF QUESTIONS CORRECT DURING TRAINING 


Figure 4. Percent correct on the posttest as a function 
of number correct or in error during training. 


correctly, the more likely it was recalled on 
the posttest. The separation of the curves for 
different numbers of errors shows that the 
more frequently an item was missed, the less 
likely it was recalled on the posttest. This 
simply reflects the fact that the items that 
were more difficult to learn were likely to be 
missed more frequently. The concave shape 
of the curves indicates that the repeated 
presentation of a correct item has a de- 
creasing effectiveness. The implication is 
that as much as possible, training time 
should be allocated to those items that the 
student has correctly answered least 
often. 

When students missed items in answering 
a question, SCHOLAR provided additional 
elaboration about some of the items missed. 
For example, in Figure 3 when the student 
did not know about Lake Maracaibo, 
SCHOLAR mentioned the oil there as an 
elaboration about Lake Maracaibo. The 
backtrace analysis showed that percent 
correct on the posttest increased from 3496 
when there was no elaboration of an item 
during training to 4796 when there was one 
elaboration. This increase is significant (t = 
4.01, p « .01), indicating that elaboration 
does help students to learn the material 


better. After one elaboration, the percen| 
correct stabilized, indicating that fi 
elaborations are of little benefit. 
We used a variation of backtrace anal 
to determine which kinds of map ques 
are most effective for learning. In the ma 
system there were three different kind 
map questions that might be asked: 
pointing questions, for which SCHO 
mentioned one or more places an 
student to point at them; (b) na 
tions, for which SCHOLAR blin 
more places and asked the studen 
them; and (3) pointing and naming 
tions, for which SCHOLAR asked thes 
to name a set of places, such as the rh 
Brazil, and point to them in the 
named. F 
Table 2 shows the percent correct ol 
second occurrence of a map questio 
any item as a function of the type of quest 
that was asked on the first occurrence ott 
item. There were not enough data fot 
ing questions, so they are not show 
column totals indicate that students. 
better on pointing questions tham 
pointing and naming questions, as WO! 
expected because pointing questio 
easier. However, the row totals show; 
students did better on the second qui 
if the first question required both point 
and naming than if it required only pointi 
x2(1) = 4.75, p < .05. Evidently, sti 
learn more from pointing to and na 
location than from just pointing to it. 


Conclusion 


The experiment showed that studel 
learned significantly more with the interi 
tive map display than with either a S 
labeled or unlabeled map. The advantage 


"Table 2 a 
Effect of Different Types of Map Questions 
During Training 


96 correct on 2nd occ 


Type of aming 
question and 
on Ist point- 
occurrence Pointing ing 
Pointing 49 33 
Naming and pointing 61 51 
Column total 53 41 
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Map-SCHOLAR cannot be attributed solely 
to the ability of the student to locate places 
spatially, since the labeled map condition 
allowed the student to identify places just as 
effectively. The advantage of Map-SCHOLAR 
also cannot be attributed to novelty or some 
other generalized facilitation effect, because, 
as backtrace analysis showed, the effect was 
specific to location information and did not 
carry over to nonlocation information. The 
advantage therefore must have been due 
mainly to the dynamic aspects of Map- 
SCHOLAR and its ability to focus the stu- 
dent’s attention on the relevant map infor- 
mation. 

The experiment also demonstrated the 
usefulness of the backtrace analysis tech- 
nique for evaluating computer-assisted in- 
struction systems. Backtrace analysis is not 
dependent on the type of information being 
taught and is thus transferable to com- 
puter-assisted instruction systems other 
than SCHOLAR. Of course, the specific tags 
used to mark the data would change, de- 
pending on the different teaching strategies 
and training events that are being evaluated. 
The ability to perform fine-grain analyses of 
the effectiveness of different teaching 
strategies is a valuable tool for future edu- 
cational research. 
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Thematic Analysis: An Empirically Derived Measure of the 
Effects of Liberal Arts Education 


David G. Winter 


Wesleyan University 


The rationale and development of the Test of Thematic Analysis, reflecting 
ability at complex concept formation, are presented. Subjects were asked to 
formulate and articulate the differences between two groups of stories. Scoring 
categories were derived from differences in freshman and senior responses at 
a traditional liberal arts college and were cross-validated on another sample. 
In a second study, thematic analysis scores were significantly higher among se- 
niors than among freshman at the traditional liberal arts college, but there 
were no significant differences at two more vocationally oriented colleges. 


Partisans of the liberal arts college are 
convinced that liberal education produces or 
facilitates important kinds of cognitive and 
intellectual growth in students. Students are 
said to learn “to think effectively, to com- 
municate thought, to make relevant judg- 
ments, to discriminate among values” 
(Committee on the Objectives, 1945, p. 65). 
Especially since World War II, an enormous 
national investment in liberal arts education 
has been made in the faith that cognitive 
gains of this sort (among other benefits) 
would be reaped in “liberally educated”! 
graduates. 

Yet there has been little evidence, or even 
systematic research, to support the faith of 
liberal arts educators. Bird (1975) argued 
that the claims of liberal education have not 
been systematically tested 


because the liberal arts are a religion, the established 
religion of the ruling class. The exalted language, the 
universal setting, the ultimate value, the inability to 
define, the appeal to personal witness, the indirectness, 
the aphorsims—these are all the familiar modes of re- 
ligious discourse. (p. 109) 


Part of the problem, as Bird noted, is that 
the “ability to think,” as that term is used by 
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Harvard University 


liberal educators, is not at all like the ki 
“thinking” studied by cognitive psycl 
gists. Since the cognitive claims of li 
education have not been adequately 0 
tionalized, the studies that have been ca 
out on the effects of liberal arts edu 
have concentrated on values, beliefs, a 
similar kinds of personality disposition: 
Often little change is found even on the 
dimensions (see, e.g., Feldman & Newcon 
1969; Jacob, 1957; Sanford, 1964; Ta 
1974). k 
Perhaps it is useful to look again at the li 
of goals cited above. Taken together, thes 
goals suggest an ability to form comple 
concepts that will discriminate among 
stract data and an ability to communic 
these concepts or put them into clear lan: 
guage. To the liberally educated person, 
environment furnishes data, whether fron 
a scientific experiment, observation 0 
human social and economic activity. 
viewing of a painting, or simply everyday 
The data as presented are chaotic, confusing 
and seemingly random; or even worse, the) 
are “wrongly” labeled with misleading a 
useless concepts. The liberally educa 
person, it is argued, transforms these dat 
into information that is intelligible ant 


1 “Liberal” or “liberal arts” education contrasts with 
“vocational” or “technical” education. Its roots go b 3 
to Plato’s Republic. From about 1935-1965, it enjoyé 
a great vogue in the United States, but in recent ye! 
it has been variously attacked as ethnocentric, eli 
or useless. 


eful by creating and using categories. The 
complex is restated in terms of its simple, 
ordered component parts. This categorizing 
activity has three important effects: (a) It 
reduces the complexity and, therefore, the 
necessity for continual new learning. (b) It 
increases the precision and effectiveness of 
instrumental action. (c) It advances the 
codification and coherence of our knowledge 
(see Bruner, Goodnow, & Austin, 1956, pp. 
12-15). 

Even though psychologists have studied 
concept formation in the laboratory for over 
50 years, ever since Hull's (1920) pioneering 
effort, most of this research was designed to 
shed light on the basic processes involved in 
coding and storing information and the ele- 
mentary conditions that influence concept 
formation and concept attainment. There- 
fore, the concepts that were used were simple 
ones, combinations of a few clearly defined 
and discrete attributes, each of which would 
vary in a few ways clearly defined in advance 
by the experimenter (Bruner et al., 1956; 
Hanfmann & Kasani, 1937; Heidbreder, 
1946). It is not surprising, therefore, as Bird 
(1975) noted, that the experimental litera- 
ture on concept formation has not been very 
useful in conceptualizing and measuring the 
effects of liberal education. 

The notion of cognitive complexity has 
stimulated a good deal of theory and re- 
search over the past 20 years. There is now 
some agreement that cognitive complexity 
(also called information-processing ability) 
has at least two distinguishable components: 
(a) differentiation, or the number and vari- 
ety of independent dimensions used in 
categorizing (Bieri, 1961, 1966), and (b) in- 
tegration, or the complexity with which rules 
and concepts are combined and used in 
thinking. One of the most widely used mea- 
sures of integrative complexity is based on 
the Paragraph Completion Test (PCT) de- 
veloped by Schroder and his colleagues 
(Schroder, Driver, & Streufert, 1967). Al- 
though these variables seem better suited to 
the level and kinds of cognitive activity that 
occur in a collegiate liberal arts education, 
they have usually been studied as personality 
or cognition processes in their own right (e.g, 
MacNeil, 1974) or as predictors of attitudes 
or performance on a variety of experimental 
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laboratory tasks (Gardiner & Schroder, 
1972). 

There is some evidence that integrative 
complexity increases with age (Hunt, cited 
in Schroder et al., 1967, p. 130). PCT scores 
do vary by academic field, and they are as- 
sociated positively with grades in the 
humanities and social sciences and nega- 
tively with grades in engineering (Pohl & 
Pervin, 1968; Russell & Sandilands, 1973). 
The kinds of educational environments that 
ought to foster integrative complexity have 
been described (Schroder et al., 1967, pp. 
45-53); and although the description does fit 
the liberal arts college, there has been little 
direct testing of whether integrative com- 
plexity scores go up as a result of that kind 
of education, as distinct from effects of 
general maturation. 


Study 1 


A New Approach 


This article presents research on the de- 
velopment and validation of a new kind of 
measure of the effects of liberal education. 
This new measure reflects the ability to form 
and articulate complex concepts and then 
the use of these concepts in drawing con- 
trasts among examples and instances in the 
real world. Developing such a measure in- 
volves two separate problems, and each 
problem requires a brief discussion. 

Designing a complex concept-formation 
task. Consider a typical college exam 
question, such as: Compare and contrast 
the Renaissance and the Reformation." Such 
a question may illustrate the effects of liberal 
education, but it would not be a fair general 
measure. It presumes experience in historical 
analysis and the knowledge of some facts 
about the Renaissance and the Reformation. 
Even if there is a common “core” of knowl- 
edge to liberal education, this question 
would probably give the history major an 
unfair advantage over the psychology major. 
Yet, making up different questions for each 
course of study forfeits comparability across 
all students. A task of concept formation is 
needed that does not involve material drawn 
from any particular discipline; yet, the ma- 
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terial must be complex enough to permit the 
formation of complex and abstract con- 
cepts. 

Such a task would be very similar to the 
process by which researchers in the tradition 
of experimental analysis of fantasy (or 
“thematic psychology”) develop systems for 
scoring motives and related variables. The 
researcher typically compares two sets of 
Thematic Apperception Test (TAT) stories, 
written under different conditions, and de- 
velops a scoring system to capture the dif- 
ferences (see Atkinson, 1958; Winter, 1973). 
The Test of Thematic Analysis, which is 
described in this article, is a simplified ad- 
aptation of the same procedure. Persons 
taking the test are given two groups of short 
TAT-type stories and are asked to formulate 
and describe the differences between them, 
in whatever terms and at whatever level and 
length they wish. Although the general kind 
of discrimination asked for is familiar, the 
specific task is novel to all subjects; and the 
“facts” that are to be dealt with in formu- 
lating the comparison are merely the TAT 
stories that are simple and do not involve any 
specialized knowledge or experience on the 
part of the subject. Thus the general form 
of the task does not appear to give any ad- 
vantage to particular academic backgrounds 
or course experience. Because the original 
form of this complex task of concept forma- 
tion requires subjects to differentiate, at an 
abstract (and presumably thematic) level, 
between two groups of stories, it has been 
named the Test of Thematic Analysis and 
is so designated in the rest of this article. 
„Scoring the Test of Thematic Analy- 
sis. The Test of Thematic Analysis re- 
quires a highly complex kind of concept 
formation, and there is no “single, correct” 
way to differentiate the two groups of stories. 

here is, moreover, no clear consensus about 
the actual effects (as opposed to hoped-for 
goals) of liberal education. For these reasons, 
it is difficult to decide in advance just how 
the test should be 


sents an empirically derived measure of the 
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effects of liberal education on the formatio 
and articulation of complex abstract 
cepts. 

Method 


Deriving the Thematic Analysis Measur 


Subjects. Subjects for the initial derivation 
thematic analysis measure were 12 male freshi 
12 male seniors at a high-quality, high-prestige 
liberal arts college located in New England. (Thee 
will subsequently be referred to as Ivy College. 
were recruited to take a series of “new tests 
being developed to measure cognitive abilities 
students.” Subjects were instructed to use code 
instead of their names on all test materials fj 
anonymity. 

Procedure. All subjects were given a tes 
of Thematic Analysis, with instructions as fo 


This test is designed to measure your 
read, interpret, and analyze material, and th 
synthesize new abstract concepts which m 
what you have analyzed. 


On the following pages are brief imaginative sl 
which different people have written about the 
picture—a picture of several men grouped aro 
table. These stories are divided into two differe 
groups of four stories each: Group A and Grou] 


Your task is to study the two groups of stories cam 
fully and to figure out what you feel are the d 
ences between the two groups of stories—diffe 
themes, elements, features of style, or whateve 
which are present (or largely present) in one grou 
and absent (or largely absent) in the other group. 


Finally you are to describe these differences an 
write them up, in any manner and form that yo 
like. You can use the blank pages following after th 
stories, and extra paper is available if you need it. 


You will have 30 minutes for this test. You willl 
advised when you have about 10 minutes left and 
minutes left, so that you can put your account of 
differences between Groups A and B in the fint 


form which you feel is most satisfactory, appropri 
ate, and coherent. 


It might be thought that experience or coursework 
either literature i 


in 


tion, other kinds of material can readily be substitui 
ens TAT stories with which the test was de 
O] ie 
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The rest of the test consisted of a page with the four 
Group A stories, a page with the four Group B stories, 
and loose sheets of blank paper. As subjects handed in 
their responses, all papers were stapled together. Where 
subjects had scratched out preliminary notes or in other 
ways indicated a “final version” of their response, only 
this final version was analyzed and scored. The eight 
brief TAT stories were taken from the “hypnosis” 
power-arousal experiment of Uleman (1966; see Winter, 
1973, pp. 62, 71), Group A from the power-aroused 
condition and Group B from the neutral condition. 
These stories were selected because there were major 
differences in the writing conditions and content be- 
tween the two groups. 

Five freshmen and five senior protocols were ran- 
domly selected from the responses, and the thematic 
analysis scoring system was developed by comparing the 
freshmen and senior responses and noting the differ- 
ences between them.? Nine categories, listed below, are 
the result. Categories present more often among seniors 
are scored +1; those present more often among fresh- 
men are scored —1. This scoring system was then 
cross-validated on the remaining response protocols, as 
described in the Results section. 


Outline of the Thematic Analysis Scoring 
System4 


The following three terms are used in a special and 
precise sense in the scoring system: Element —an ab- 
stract description of themes or other characteristics of 
one or more stories, usually of the same group. Examples 
are "self-interest"; "incomplete sentences." Dichot- 
omy—two or more elements that are alternative 
manifestations of an issue (see below). Even though 
they are usually opposites, the elements of a dichotomy 
do not have to be completely contradictory. Examples 
are “self-interest vs. social interest”; “simple vs. complex 
sentences.” Jssue—an abstraction that contains at least 
two (or more) elements, so that a dichotomy is possible. 
Examples are: “morality”; “sentence structure." Each 
category is scored only once, no matter how often it 
occurs in a response. 

Direct compound comparisons (scored +1). The 
response gives a clear, direct comparison between the 
two groups of stories by ascribing an element to one 
group and either explicitly not ascribing it to the other 
group or explicitly ascribing a contrasting element to 
the other group. Vague or relative comparisons are not 
scored, Example: “Group A stories involve acceptance 
of authority, while Group B stories involve rejection of 
authority.” 

Exceptions and qualifications (scored +1). Ex- 
ceptions or qualifications to an element, dichotomy, or 
issue are mentioned. 

Examples (scored +1). Examples are quoted or 
cited from the stories to illustrate an issue, dichotomy, 
or element. 

Analytic hierarchy (scored +1). This consists of 
statements approximating the following ideal form: 

explicit mention of an overarching issue, containing a 
dichotomy of at least two elements, one of which is as- 


_ cribed to one group and the other of which is ascribed 


to the other group. Example: “All stories involve rela- 
tions to authority [issue]; the contrast being between 
acceptance and rejection of it [dichotomy]. Group A 
stories involve accepting authority, while Group B 
stories involve rejecting authority." 

Redefinition (scored +1). This occurs by altering 
or redefining an issue, dichotomy, element, or even the 
meaning of a story itself to broaden the "coverage" of 
a feature to apply to more stories than before or to in- 
crease the precision or sharpness of the contrast be- 
tween the two groups. 

Subsuming alternatives (scored +1). An element 
is defined disjunctively by several nonsynonymous but 
functionally equivalent alternatives, Example: Group 
A stories involve either timid acceptance of authority 
or active rejection of it, while Group B stories involve 
either moderate suspicion of authority or indifference 
to it.” 

“Apples and oranges" (scored —1). A comparison 
that is not really a comparison—as if apples and oranges 
were being compared. More formally, this is when the 
elements of a dichotomy are unrelated instead of op- 
posed or contradictory. Example: “Group A stories in- 
volve accepting authority while Group B stories involve 
sports.” 

Affect (scored —1). An element or dichotomy is 
based on the reader's emotional reaction to the story 
rather than the story itself. Example: “Group A stories 
are more interesting than Group B stories.” 

Subjective reaction (scored —1). An element or 
dichotomy is based on the writer's own reaction, so that 


3 The scoring categories were developed “empirical- 
ly," looking for whatever differences were there. This 
process was not guided by any conscious theory, other 
than the author's general sense and personal experience 
of what happens in liberal education. The gradual re- 
finement of these differences into the explicit, related 
set of categories presented here was undoubtedly in- 
fluenced by a variety of theoretical elements; but again, 
it was not guided by any one specific, conscious theo- 
ry. 
^ This outline is for purposes of illustration and is not 
adequate for actual scoring purposes. A full version, 
including examples, sample response protocols, and 
practice materials for learning to score may be obtained 
from the first author at the address given for re- 
prints. 

‘There are certain similarities between some of the 
thematic analysis categories and integrative complexity 
as scored on the PCT, but there are important theo- 
retical and practical differences between the two mea- 
sures as well. Thematic analysis was derived empirically 
in an explicit attempt to capture the effects of liberal 
education, whereas the integrative complexity measure 
was derived from more theoretical considerations, It is 
scored along a 7-point scale, whereas thematic analysis 
involves nine binary decisions about the presence or 
absence of each element. Taken together, these nine 
elements are heuristic in that they suggest different 
components or aspects rather than a single linear skill. 
Of course, the empirical relation between the two 
measures can only be determined by further re- 
search. 
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Thematic Analysis Category and Total Scores in Derivation and Cross-validation Samples 


% with 1+ instance of category 


Sample 1 2 3 4 5 6 7 8 9 

Derivation groups 

Freshmen? 60 20 40 20 20 0 60 60 20 

Seniors* 100 60 80 40 60 80 0 20 0 
Cross-validation groups 

Freshmen! 71 43 43 29 0 29 57 57 14 

Seniors? 100 71 71 57 14 57 0 29 0 
"nzb. 
bn=7, 


*t = 3.01, p € .01, one-tailed. 


first-person pronouns are used. Example: “I like Group 
A stories better than Group B stories." 

Most of the categories of the thematic analysis scoring 
system have face validity because they directly involve 
precision and clarity of expression. They will be familiar 
to anyone who has graded student papers and essay 
examination answers as capturing the elements of 
“good” (Categories 1-6) and “bad” (Categories 7-9) 
answers. Connections can readily be made between 
these categories and the usual statements of goals of 
liberal arts colleges. Describing and defining the cate- 
gories in this way, however, has several advantages. 
These definitions are clear. These categories actually 
change as a result of college education (see below). Fi- 
nally, these categories can be quickly and easily learned 
by scorers. Working independently and with only brief 
instruction, one scorer who had no previous experience 
attained very high agreement with the first author’s 
scoring (category agreement = .90; see Winter, 1973, p. 
248; rho on total scores = .85).5 


Results 


Cross-validation of the Thematic 
Analysis Scoring System 


To cross-validate the scoring system, the 
remaining 14 freshmen and senior responses 
were mixed together and scored without 
knowledge of the subjects’ college class. 
Table 1 presents the results for both the 
derivation and cross-validation samples. In 
the cross-validation sample, each of Cate- 
gories 1-6 was present more often among 
seniors, whereas each of Categories 7-9 was 
present more often among freshmen. The 
mean of the senior total scores was signifi- 


cantly higher than the mean of the freshman 
total scores. 


Three-College Study: Study 2 


To confirm and extend the results at Ivy 
College, a large-scale study of freshmen and 


senior students was carried out at I 
and two other institutions locate 
same geographical region. The first 
to as Teachers College, is a 4-yé 
supported institution, relatively m 
tive, and enrolling mostly lower-mid 
commuter students who are prep 
specific vocations such as teachin 
second, referred to as Community 
is a 2-year institution serving stu 
about the same background as Te 
College but having more of a libera 
orientation. Here the two samples W 
freshmen and second-year students. _ 


Method 


Subjects and Procedure 


At each college, male and female subjects were! 
cruited by advertisement and by direct contact and w 
offered $12 for spending 3 hours “taking various kini 
of new and interesting tests,” which included the 
of Thematic Analysis, a TAT, and numerous oth 
procedures (see Winter, McClelland, & Stewa 
press). All subjects were told that the purpose of testi 
was to design and improve some new tests, and 
their responses would be confidential and unavai 
to officials at their college. By accident, the seniors 
Ivy College had higher Scholastic Aptitude Test (S. 
scores than the freshmen, so the two groups 
matched by randomly discarding some of the h 
scoring seniors and lower scoring freshmen. Once 
was done, the freshmen and senior samples within 
college were closely matched on SAT scores, fathers’ 
mothers’ education, proportion of working mothers, an 
plans to teach. It should be further noted that attritio 
is very low at Ivy College. This, together with th 
matching procedure, makes attrition an unlikely eX 
planation of any freshmen-senior differences at any¢ 


® Since this reliability study was done, the instructio 
procedures for scoring have been elaborated and im 


proved, as noted above. 
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Table 2 
Thematic Analysis Scores of Freshmen and Seniors at Three Colleges 
Uncorrected Corrected for length? 
Sample M SD t M SD t 
Dy Catene (9) 
reshmen (121 1.22 1.50 .63 1.41 
Seniors (100) 2.00 154 SH 121 136 3.08" 
Difference -18 .58 
"Teachers College 
Freshmen (30) 17 1,21 1.06 01 1.29 
Seniors (38) ‘50 131 i 37 1:29 Se 
Difference .33 .38 
Community College 
Freshmen (30) 40 1.25 <1 .56 1.09 
Seniors (32) Al 1.09 ‘57 1.06 xi 
Difference .01 KU! 


Note, Numbers in parentheses are ns. 


a Corrected for correlation of score with number of words in response by subtracting from raw score the predicted score given 
number of words and regression of score on number of words. Formula: Predicted score = .0075 actual score — .306; with a constant 


of 1.00 added to achieve positive sample means. 
* p < .003. 
** p « .001. 


the colleges. Finally, comparison of three colleges makes 
it possible to rule out simple maturing as an explanation 
of a change that occurs at only one college. 


Results 


College Effects 


Table 2 presents the mean scores for 
freshmen and seniors at the three institu- 
tions. Since there were no differences in 
scores and differences for men and women, 
the data for both sexes were combined. 
There was a significant and substantial 
correlation between the score of a thematic 
analysis response and the number of words 
that it contained (r = .46, N = 388, p < .001, 
for the entire sample). On theoretical 
grounds alone, it is not clear whether scores 
should be corrected for length of response. 
Longer answers in this sort of task may bean 
intrinsic part of improved thematic analysis 
ability rather than a "spurious" factor re- 
sponsible for the score improvement. (In the 
language of social surveys, length of response 
may be part of a “developmental sequence" 
rather than "spurious;" see Hyman, 1955, pp. 
254-274; and Blalock, 1964.) Therefore, 
Table 2 presents results for both. uncorrected 
scores and for corrected scores, in which the 
effect of the correlation with length was re- 
moved by subtracting the score predicted 
from the equation for the regression of score 
‘on length from the raw score. 


Using either the uncorrected or the cor- 
rected scores, the results of the original re- 
search at Ivy College have been confirmed 
with a much larger sample. Thematic anal- 
ysis scores increase significantly from 
freshman to senior year. Thematic analysis 
scores did go up at Teachers College, but not 
significantly, whereas there was no differ- 
ence between first- and second-year students 
at Community College. (The unequal num- 
bers make a two-way analysis of variance 
unwieldy if not impossible.) These results 
suggest several conclusions: (a) The increase 
at Ivy College cannot be explained as a sim- 
ple age or maturing effect, since it was much 
greater than the changes that occurred 
elsewhere. (b) The higher attrition rates at 
the other two colleges cannot account for the 
results as long as it is assumed that students 
with higher thematic analysis scores would 
not be more likely to drop out, and they 
might be less likely to do so. (This assump- 
tion cannot be tested with the data at hand, 


6 The uncorrected difference—.78—is a good deal 
smaller than that observed in the original samples 
shown in Table 1. This kind of shrinkage often occurs 
in empirical derivation research of this kind and is 
probably caused by several things: The very small initial 
samples may have capitalized on chance difference; the 
scorers of the longer sample may have been less careful; 
and the sampling of subjects for the initial study may 
have been less systematic. The important point is that 
the same differences continue to hold up. 
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but it seems reasonable.) In other words, the 
greater attrition rates at Teachers College 
and Community College tend to act as a 
conservative factor in the present case. (c) 
Therefore, the educational experience at Ivy 
College is very likely responsible for this in- 
crease. It is, of course, difficult to pinpoint 
specific features of the college that might be 
responsible for this effect, but the most 
plausible factor would be the traditional 
liberal arts and abstract conceptual em- 
phasis of Ivy College, in contrast to the 
skill-oriented “vocational” atmosphere of 
the other two institutions. There is, then, 
strong evidence that the Test of Thematic 
Analysis measures some of the important 
effects of liberal education, at least at one 
institution. 

Inspection of the correlations of thematic 
analysis and college major among the Ivy 
seniors makes it possible to rule out a bias on 
the test in favor of those with experience in 
either English, languages and literatures, or 
psychology. The point-biserial correlations 
between majoring in these subjects and 
thematic analysis score were .07, .11, and .02, 
respectively. Among the seniors, thematic 
analysis was correlated with majoring in 
mathematics, physics, or engineering (r = 
-20, p € .05), but since this is almost a sig- 
nificant increase from the zero correlation 
with these (intended) majors among fresh- 
men (r = .00), one could argue that those 
departments at Ivy college develop thematic 
analysis skills, rather than that the test is 
biased toward those areas of experience. 
"Thematic analysis scores were not highly 
correlated with combined Verbal and 
Mathematical SAT scores (rs = .18 and .14 


for the Ivy freshmen and seniors, respec- 
tively). 


Operant Versus Respondent Concept 
Formation 


The "Test of Thematic Analysis requires 
subjects to generate and articulate their own 
complex concepts. In theory, therefore, it is 
an operant” test that is different from the 
traditional "respondent" 
attainment that afte: 


Positive and negative Instances, ask them to 
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ticulate abstract concepts is not the san 


ples of the concept (e.g., Bruner et al., 195¢ 
Heidbreder, 1946). To check whether thes 
two approaches were also different empiri 
cally, 62 freshmen and 52 seniors at Iy 
College were also given an “objective” su 
plement, in which they were asked to 
eight additional stories as belonging eith 
to Group A or Group B. The stories were 
fact drawn from the same two conditio 
the eight stories on the regular test, so! 
it was possible to score “concept attainm 
in the usual way. This score was not hi 
correlated with the thematic analysis; 
among the freshmen (r = .16) or thes 
(r = .11), and it showed virtually nog 
from freshman to senior year (t fo 
ence = .09). Thematic analysis was 
highly correlated with the Heid 
measure of concept attainment (rs = 
.07 for the Ivy freshmen and se 
spectively). 

The thematic analysis score, an op 
measure of concept formation, is thus é 
ceptually and empirically distinct fromt 
ditional, respondent measures of com 
formation. Whatever ability the test n 
sures is not picked up by the objective 
cedure. This is not surprising in vi 
McClelland's (1966) discussion of the opt 
ant-respondent issue in psychological me 
surement. 


A “Good” Response and a “Correct” 
Response 


Although these results suggest that the Í 
students’ improved ability to form and 4 


thing as their (not necessarily improve 
ability to “attain” concepts, the reader m 
still wonder whether the thematic analys 
responses of the Ivy seniors were more like 
to be correct. After all, every teacher kno 
the difference between a good answer anc 
correct answer. Since the two groups | 
stories used in the present study varied i 
many possible ways, there is no basis fi 
judging the accuracy of responses in al 
rigorous way. Still, since the two groups 
stories were in fact drawn from “powe 
aroused” and “power-neutral” conditi 
and do therefore differ in need for po 
Score, one could argue that any them 
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analysis response that included “power” as 
the basis (or one basis) of the distinction 
would be correct in some sense. At Ivy Col- 
lege, the seniors did include significantly 
more power themes in their responses than 
the freshmen (Ms = 2.20 and 1.87, respec- 
tively); t (219) = 2.20, p < .03, even though 
there were no class differences at the other 
two colleges. Further, among freshmen, 
thematic analysis and number of power 
themes were uncorrelated (r = —.05), 
whereas among the Ivy seniors there was a 
significant association (r = .34, p < .001). 
This suggests a further refinement in the 
formulation: The liberal education of Ivy 
College improved the ability to form and 
articulate concepts, sharpened the accuracy 
of concepts, and tended to fuse these two 
component skills together. 


Discussion 


This article has introduced the Test of 
Thematic Analysis as a measure of the ef- 
fects of liberal education, as distinct from 
other kinds of education and general mat- 
uring. Although these results may be of in- 
terest, they are drawn from the study of only 
one traditional liberal arts college. The next 
step is to determine whether the same effects 
are to be found at other liberal arts colleges. 
Are they in fact due to liberal or general ed- 
ucation as such, or are they only idiosyn- 
cratic to Ivy College? Can the same effects 
be found at other colleges that although 
committed to liberal (versus vocational) 
education, are less traditional and more in- 
novative? Can this ability to form and arti- 
culate concepts be taught directly in courses 
designed to improve “critical thinking?” 
Research to answer these important ques- 
tions is currently underway. 

Further research is also necessary to es- 
tablish the psychometric credentials of the 
Test of Thematic Analysis. In addition to the 
usual questions about reliability and corre- 
lation with other measures such as Bieri’s 
measure of cognitive differentiation or the 
Paragraph Completion Test, we also need to 
know whether the ability measured by the 
test predicts performance in academic work. 
Finally, is this ability—perhaps fostered by 
liberal education—associated with success- 


ful outcomes in the world at large? Research 
to answer these questions, including a long- 
itudinal study of the seniors in the present 
research, is also underway. 

In the version reported here, the Test of 
Thematic Analysis asked subjects to com- 
pare and contrast two groups of TAT stories. 
'TAT stories were used because they give no 
special advantage to particular educational 
experience and coursework, but any two 
passages of material—or two concepts—can 
be used as the basis of the thematic analysis 
task. At the same time, the scoring categories 
are general, so that they can be applied to the 
kinds of comparisons and contrasts called for 
by most essay questions. For example, an- 
swers to the following kinds of questions, 
drawn from quite different fields, could be 
scored with the present system: (a) Distin- 
guish between Freud's and Rogers' theories 
of the normal personality. (b) What are the 
essential differences between normal and 
malignant cells? (c) Compare and contrast 
Moliére’s Don Juan and Mozart's Don Gio- 
vanni. The principles and scoring categories 
of the Test of Thematic Analysis could, 
therefore, have wide use in the study of 
complex concept formation and develop- 
ment of conceptual abilities through edu- 
cational programs. 
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Preschool Measures of Self-esteem 
and Achievement Motivation as Predictors 
of Third-Grade Achievement 
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Longitudinal data on 404 children from predominantly low-income areas in 
three regionally distinct sites were used to determine (a) the relation of pre- 
school, kindergarten, and first-grade measures of self-esteem and achievement 
motivation (primarily the Brown IDS Self-Concept Referents Test and 
Gumpgookies) to reading, mathematics, and problem-solving (Raven Col- 
oured Progressive Matrices) performance in the third grade and (b) whether 
such measures can improve on predictions made solely from an early achieve- 
ment measure (Caldwell’s Preschool Inventory). Although the early self-es- 
teem scores had a strong negative skew, they contributed significantly to pre- 
dictions of third-grade performance. However, the predictive variation in the 
scores may have represented differences in task understanding and attentive- 
ness rather than differences in self-esteem. Achievement motivation scores, 
especially in the year prior to entrance into first grade, contributed signifi- 
cantly to predictions of later achievement. Results varied somewhat by sex, so- 


cioeconomic status, and geographical site. 


Standard preschool achievement tests 
have been found to be only somewhat pre- 
dictive of later academic performance. Since 
children’s school performance is influenced 
not only by what they know but by their at- 
titudes and motives, consideration of vari- 
ables from the affective domain (e.g., 
achievement motivation, self-esteem) should 
improve predictions of academic success. If 
such variables were found to be important 
predictors, either by themselves or in com- 
bination with other variables, they might be 
valuable in the early identification of chil- 
dren likely to experience difficulties in aca- 
demic achievement. Furthermore, more 
complete knowledge of the relation of these 
affective-social variables to later school 
achievement should help guide the imple- 
mentation of Head Start and other preschool 
programs designed to facilitate later 
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achievement by encouraging the child's de- 
velopment in these areas. For example, the 
finding that individual differences in early 
measures of self-esteem are predictive of 
later academic achievement would provide 
additional support for increased and sys- 
tematic efforts to raise self-esteem. Simi- 
larly, a preschool program that claimed it 
was successful because it increased children’s 
achievement motivation might be considered 
truly successful only if measured achieve- 
ment motivation could actually be shown to 
predict subsequent achievement. 

Since preschool children’s performances 
on achievement measures are themselves 
influenced by affective states of the child 
while taking the test (Zigler & Butterfield, 
1968), it is unclear whether independent 
assessment of relevant affective variables 
would increase predictions to later achieve- 
ment. One would expect such independent 
predictions for newly emerging affective 
feelings that have not had an opportunity to 
influence the early achievement scores. In- 
deed, a number of investigations report sig- 
nificant incremental validities for affective 
measures over what could have been pre- 
dicted solely from aptitude or achievement 
tests (e.g., Cattell, Barton, & Dielman, 1972; 
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Khan, 1969). Nearly all such studies, how- 
ever, involve children from the third-grade 
level or beyond. 

There is some research, however, that as- 
sesses the ability of measures of self-esteem 
at the preschool or kindergarten level to 
predict later school achievement. One ex- 
ample is a study by Wattenberg and Clifford 
(1964), who related ratings of self-concept 
made at the beginning of kindergarten with 
reading test scores 2 years later. Self-con- 
cept scores were obtained from judges’ rat- 
ings of tape-recorded remarks made by 
children while drawing pictures of their 
families and while responding to a specially 
constructed incomplete-sentences test. For 
their measure of self-esteem (Quantified 
Self-Concept [good-bad]), significant pre- 
dictions to the reading score (at the .05 level, 
one-tailed) were found in only 4 of the 14 
subgroups in their analysis; the magnitude 
of the correlations was not reported. 

Research relating early indicators of 
achievement motivation to actual early ele- 
mentary school achievement also has been 
very limited, due largely to a lack of ade- 
quate measuring instruments of early mo- 
tivation. Assessment procedures that work 
well with older children and adults may not 
be feasible or valid with young children. One 
attempt to assess achievement motivation 
directly in preschool and kindergarten chil- 
dren is an objective-projective technique 
known as Gumpgookies that is designed to 
elicit choices between alternative behaviors 
that reflect differences in motivation (Ad- 
kins & Ballif, Note 1). While the authors 
provide some evidence of concurrent valid- 
ity, evidence on predictive validity is lacking. 
A commercial version of Gumpgookies, An- 
imal Crackers, is currently being nationally 
marketed in a “research edition,” although 
no information is yet available on its ability 
to predict school achievement. 

Another approach to the assessment of 
affective and social functioning in young 
children is the use of teacher or observer 
ratings. For example, Kohn and Rosman 

(1974) found that kindergarten teacher rat- 
ings of 209 lower- and middle-class boys on 
three social-emotional variables (apathy- 
interest, anger-cooperation, and task ori- 
entation) were significantly related to 
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achievement in second grade, especially for 
the task orientation score. However, when 
kindergarten measures of cognitive func: 
tioning were included first in the predictio 
equations, the affective-social variables did 
not significantly add to prediction of arii 
metic or word knowledge and contribute 
only an additional 3% of the variance fo 
predictions of reading achievement. Pusse 
and McCandless (1974), with a longitudi 
sample of economically disadvan 
children, usedna number of factor ana 
cally derived “socialization dimensions 
predict achievement at the end 
grade from data obtained while tht 
were in prekindergarten classes. A 
tering scores from the verbal facility 
in a multiple regression, a facto 
“coping with anxiety by aggression” 
tributed significantly to the multiple ¢ 
lation for girls. This factor was defin 
largely by the preschool teacher’s rating 
aggression. For boys, only the “alienai 
factor added significantly to the pred 
tion. 

Previous investigations of the relation 
affective-social behaviors to later academ 
performance were necessarily limited by t 
lack of a longitudinal data base that wi 
relatively comprehensive with respect 
children sampled or variety of measures in 
cluded. For example, Kohn and Rosn 
(1974) sample was limited to boys living li 
New York City, and possible sex or locatio 
differences obviously could not be disc 
Further, Kohn and Rosman's affective-s 
measures were limited to teacher rating 
and no self-report measures were use 
Pusser and McCandless (1974), with 


dren, had no measure of achievement mol 
vation either from an individual child testé 
from teacher ratings. 

The current analyses address two ma 
questions: (a) the relation of measures 
self-esteem and achievement motivatid 
obtained in the Head Start year, kindergal 
ten, and first grade to reading and mathe 
matics achievement in the third grade an 
(b) whether in the preschool years suc 
measures can improve predictions mad 
solely from a preschool achievement me 
sure. A criterion measure of problem-solvin 
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ability also was included in order to investi- 
gate possible differential predictions when 
compared with the more directly school- 
oriented achievement measures. 


Method 


Subjects 


The sample for the current report is a subsample from 
an extensive longitudinal investigation of young chil- 
dren, most of whom come from economically disad- 
vantaged families. Sample selection procedures and 
initial sample characteristics have been presented 
elsewhere (Shipman, 1973). Briefly, in the fall of 1968 
four regionally distinct communities were selected 
which (a) had sufficient numbers of children in grade 
school and in the Head Start program, (b) appeared 
feasible for longitudinal study, given expressed com- 
munity and school cooperation and expected mobility 
rates, and (c) offered variation in preschool and primary 
grade experiences. The study sites chosen were Lee 
County, Alabama; Portland, Oregon; St. Louis, Mis- 
souri; and Trenton, New Jersey; however, the St. Louis 
site had to be dropped in the third year of the study, and 
children there were lost for further longitudinal analy- 
ses. Within these communities, elementary school dis- 
tricts with a substantial proportion of the population 
eligible for Head Start were selected. In each school 
district an attempt was made to test all nonphysically 
handicapped, English-speaking children who were ex- 
pected to enroll in first grade in the fall of 1971 (i.e., 
children of approximately 3%-4 years of age). 

In 1969 mothers were interviewed and children were 
tested prior to their enrollment in Head Start or any 
other preschool program. For this initial four-site 
sample, at least partial data were obtained on a total of 
1,875 children of whom 62% were black and 53% were 
male. The current analysis focused on children from the 
longitudinal sample (i.e., children originally tested in 
1969) who had complete data and valid scores on Year 
6! Cooperative Primary Tests plus at least one of the 
self-esteem or achievement-motivation measures from 
the first 4 years of the study. A substantial number of 
children from the original sample, though located for 
individual testing, were no longer in target classrooms 
(i.e., classrooms containing 50% or more children who 
had been previously tested) and therefore were not 
given the group achievement tests necessary for the 
current analysis. In addition to simply moving out of the 
district, the most frequent reasons for no longer being 
in target classrooms were being retained in or skipping 
a grade, enrollment in a private/parochial school, and, 
in Portland, exercising the option available there to be 
bussed to a different elementary school. 

Given the similarity of preliminary findings for 
Portland and Trenton, data from these two sites were 
pooled to form a combined urban/northern sample. Lee 
County is a basically rural southern county in which, 
given the absence of a public kindergarten program, 
Head Start was a kindergarten-level program rather 
than a prekindergarten program as it was in the urban 
sites. Therefore, Lee County was treated separately in 
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all analyses. For simplicity of presentation, Portland 
and Trenton are referred to as the urban sites and Lee 
County is referred to as the rural site; however, the 
reader should remember that Lee County differs from 
Portland and Trenton in more than just its level of ur- 
banization. Because of the small number of white chil- 
dren with the necessary scores who had attended Head 
Start and the fact that this small group of white children 
had somewhat different background characteristics, 
they were excluded from the current Head Start sam- 
ples. A white middle-socioeconomic-status (SES) 
sample from Lee County that attended private schools 
or kindergartens was included, however, to permit 
race/SES (race and SES are totally confounded) com- 
parisons in the rural site. In the urban Head Start 
sample there were 90 boys and 77 girls; in the rural Head 
Start sample, 89 boys and 72 girls; and in the rural 
“other preschool” group, 41 boys and 35 girls. 

On one common SES index (Census Bureau classifi- 
cation of head-of-household occupation [from profes- 
sional = 0 to laborer = 9, plus an additional category 
unemployed = 10]), the two Head Start samples were 
quite comparable (urban M = 7.51, SD = 2.35 and rural 
M =7.31, SD = 1.78), although on a second index (the 
highest grade in school attained by the mother) the 
urban sample was slightly higher (M = 10.39 vs. 9.32 
with SDs of 2.25 and 2.38, respectively). As intended, 
the Lee County "other preschool" sample was of sub- 
stantially higher SES, with a mean head-of-household 
occupational level of 1.78 (SD = 2.38) and a mean 
mother's educational level of 13.54 (SD = 2.54). 


Measures of Self-esteem 


Brown IDS Self-Concept Referents Test. 'This task 
attempts to assess children's attitudes and feelings 
about their general ability, appearance, physical state, 
affective tone, and fears (Brown, Note 2). A full-length 
color Polaroid photograph of the child is taken, and 
after the tester verifies that the child recognizes her- 
self/himself in the picture the child is asked to respond 
to 14 bipolar items (e.g., “Is (child's name) happy or is 
she/he sad?"). After the 14 items are administered, the 
child is asked to respond to the same items again, this 
time answering as she/he thought her/his teacher would 
respond in describing how she/he felt. Thus, the task 
attempts to assess the child's perception of "self-as- 
object." The Brown was administered in Years 1, 2, 3, 


YThroughout this article “Year” refers to year of the 
Longitudinal Study: 

Year 1 = January to August 1969 (child age 31⁄2- 
AY) ` 
Year 2 = September 1969 to August 1970 (child age 
415-515) 

Year 3 = September 1970 to August 1971 (child age 
515-615) 

Year 4 = September 1971 to August 1972 (child age 
615—715) 

Year 5 = September 1972 to August 1973 (child age 
715-8) 

Year 6 = September 1973 to August 1974 (child age 
815-915). 
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and 4, although in Year 1 no teacher-referent items were 
included and in Year 2 these items were administered 
only to children attending a preschool. 


Perceived school success. In this interview item, four 
stick figures printed on a page are shown to the child, 
and he/she is asked to "point to the one that is most like 
you,” The tester explains that the first one is doing 
“very good work in school,” the second one “pretty good 
work,” the third “not too good work,” and the fourth 
“very bad work in school.” This item was administered 
in Years 4 and 6 of the study as part of a longer school- 
perception interview, but only the Year 4 score was used 
in the current predictive analyses, 


Coopersmith Self-Esteem Inventory (CSEI). Al- 
though first administered in Year 6, the CSEI was in- 
cluded in the current analysis in order to permit com- 
parisons to correlations from the earlier measures, This 
instrument was designed to provide a general index of 
the child's feeling of self-worth and self-esteem (Coo- 
persmith, 1967). After the tester reads the item, the 
child is asked to make a mark on an answer sheet after 
either "like me" or "unlike me." The items include such 
statements as "I'm proud of my school work" and "I 
often feel upset in school." The version of the CSEI used 
in the present study contained 42 items. 


Measures of Achievement Motivation 


Gumpgookies, Gumpgookies consists of dichoto- 
mous items to measure academic achievement 
motivation (Adkins, Payne, & Ballif, 1972; Adkins & 
Ballif, Note 1). The child is told that she/he has her/his 
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For this score, the child's teacher is asked to rate t 
frequency of occurrence of five behaviors (e.g., “S 
with a job until she/he finishes it.") on a 5-point sa 
from “almost never" to “almost always.” The CBI sco 
for first grade (Year 4) were included in the predi 
analyses, and third-grade (Year 6) scores were incly 
for comparison purposes. 


Measures of Cognitive Performance 


Preschool Inventory (PSI). The PSI, developed! 
Caldwell for use in Project Head Start as a gen 
achievement test for preschool children, taps a 4 
verbal, quantitative, and perceptual-motor skills. 
fined by teachers as expected of children 
ten. 


Cooperative Primary Tests—Reading 
The Cooperative Primary Tests are 
standardized achievement tests designed for 
through third grade. 


Raven Coloured Progressive Matrices ( 
sion). Compared with the measures listed 
task is more a measure of problem-solving al 
less a measure of specific school learning. It I 
individual's ability to make perceptual discriminatio 
to compare, and to reason by analogy. 


Data Collection Procedures 


Individual child measures (Brown, indi 
Gumpgookies, PSI, Raven, and child interview it 
were administered by specially trained local wo 
most of whom were black housewives with limited 


classes in all three sites. Local project staff, rather thi 
the children's teachers, administered the Coop 
d 


dentiality in the information obtained. In additio 
data from child tests, information was obtained fra 
teacher ratings of study children and their clasan 
in target classrooms with the Schaefer Classroom 


duced testing in Year 4 because it contained the e 
longitudinal subjecta). 
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Table 2 


Means, Standard Deviations, and Correlations for the Third-Grade Criterion Measures 


Minimum 
Measure Group n SD Math 
Cooperative UHSM 75 24.65 7.93 .56** .34** 
Primary UHSF 67 29.97 8.43 OX hee! oT 
Reading RHSM 72 23.11 8.10 Ote 51 
RHSF 63 26.34 8.58 .66** 20755 
ROPM 30 37.22 7.48 .14** .41** 
ROPF 22 42.06 5.27 .56** S 
Cooperative UHSM 72 28.47 8.07 de 
Primate UHSF 67 30.46 9.19 -30** 
Math RHSM 72 21.62 8.60 .95** 
RHSF 66 28.55 7.72 E ad 
ROPM 30 45.24 9.28 SOS 
ROPF 23 46.46 8.51 .63** 
Raven UHSM 17 20.20 4.31 
Coloured UHSF 71 2.83 5.32 
Progressive RHSM 72 18.26 4.31 
Matrices RHSF 67 17.28 3.95 
ROPM 30 24.80 4.59 
ROPF 23 26.34 5.21 
Coopersmith UHSM 76 24.68 6.60 
Self-Esteem UHSF 70 25.81 6.01 
Inventory RHSM 70 25.31 5.98 
RHSF 66 26.33 6.16 
ROPM 29 28.30 6.17 
ROPF 22 30.78 6.38 
Schaefer UHSM 78 13.29 5.24 
CBI Task UHSF 73 15.85 6.15 
Orientation RHSM 84 14.92 5.97 
RHSF 71 17.77 6.31 
ROPM 38 19.07 4.84 
ROPF 33 22.38 3.38 


Note. CSEI = Coopersmith Self-Esteem Invento 


* p <.05, one-tailed. 
** p <.01, one-tailed. 


Scores are included also for comparison 
purposes. Means, standard deviations, and 
correlations among the third-grade measures 
are presented in Table 2. For a variety of 
reasons (e.g., child absence, tester error) a 
child occasionally would not receive a valid 
Score on a particular instrument, which 
caused the exact n on which each correlation 
was based to vary slightly. For the purpose 
of simplifying presentation, only the mini- 
mum value of n for a particular row of cor- 
relations is presented. 


With the exception of the urban males,” 


correlations in the Head Start sample from 
the Brown to the third-grade cognitive scores 
were generally positive and significant in 

ears 1 and 2 but absent in Years 3 and 4. 
The relatively high predictions from the 


Year 1 score indicate that the Brown ad- 
ministered at ages 31/, 


; CBI = Classroom Behavior Inventory; UHSM = urban Hi 
Start male; UHSF = urban Head Start female; RHSM = rural Head Start male; RHSF = rural Head Start fem 
ROPM = rural other preschool male; ROPF = rural other preschool female. 


cognitive performance; however, the M 
correlations with the third-grade Coop 
smith Self-Esteem score suggest that it 
not assessing a stable general self-este 
personality dimension. 
It is possible that true variation in e 
self-esteem is unrelated to variation in 
self-esteem, even though it is relate! 
variation in later achievement, for ex 
by affecting the acquisition of preacad 
skills. However, it seems at least as li 


sented something other than variation 
self-esteem. For example, it might have 
flected intrinsic task motivation or a 
tiveness. Some children might have liste 
carefully to each item and considered b 
alternatives before responding, whe 
others might have quickly chosen eith 
first or the last alternative that they 

Reports from the testers in the field if 
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Table 3 
Means, Standard Deviations, and Correlations for the Measures of Achievement Motivation 
Mini- —— corned 
mum Concurrent Year 6 (third rade) 
Measure Group n M SD PSI fT fm ven 
Year 2 " UHSM 64 52.70 10.87 3 .936* 4 AT 
Gumpgookies UHSF 60 52.29 9.89 St PLI 225 13 .29* 
Year 3 UHSM 44 56.85 10.56 — =.18 ~.06 —.09 
Gumpgookies UHSF 40 54.42 11.35 — .35* 31* 34 
RHSM 74 53.73 11.14 42** .24* 59** 42"* 
RHSF 57 56.28 10.48 .33** .39** .b5** 16 
Year 4 UHSM 72 49.87 9.12 — 14 12 14 
Gumpgookies UHSF 64 52.93 6.65 — EU! 16 4 
RHSM 88 53.31 5.29 — saat 31** ll 
RHSF 68 54.34 5.13 — .06 24* —.02 
ROPM 40 50.00 7.65 — 404 OL 12 
ROPF 33 52.91 6.24 — 01 - 14 —.20 
Year 4 UHSM 77 16.23 5.69 — ll 18 .20* 
CBI Task UHSF 64 18.67 5,52 — .58** .53"* .40** 
Orientation RHSM 88 14.88 6.58 — .33** 29° 20°" 
RHSF 66 18.47 5.63 — .26* 4M** 16 
ROPM 39 20.77 4.95 — ATI 56** .A3** 
ROPF 33 2065 522 — 12 08 31 
Year 4 UHSM 76 2.61 41 — -7 04 -.10 
school UHSF 69 2.59 76 — —.16 08 -.10 
enjoyment RHSM 86 2.36 B4 — .25** 220" x ace 
RHSF 67 2.49 83 — .00 .08 =.10 
ROPM 39 1.95 92 — 20 429* +12 
ROPF 31 2.21 .93 — ,21 .10 ,02 


Note. PSI = Preschool Inventory; CBI = Classroom Behavior Inventory; UHSM = urban Head Start male; UHSF = urban Head 
Start female; RHSM = rural Head Start male; RHSF = rural Head Start female; ROPM = rural other preschool male; ROPF 


= rural other preschool female. 
* p <.05, one-tailed, 
** p <.01, one-tailed, 


cated that in fact many children did appear 
to be responding in the latter manner. On the 
assumption that true self-esteem was high 
in both groups, children in the former (at- 
tentive) group would receive higher scores 
than children in the latter (inattentive) 
group (since the position of positive re- 
sponses was counterbalanced), thus ac- 
counting for the correlation of the self-es- 
teem (attentiveness) scores with later cog- 
nitive performance. Consistent with this 
interpretation, the low correlation of the 
Year 1 Brown scores with concurrent PSI 
scores in the rural Head Start sample might 
reflect the relatively minor influences of in- 
trinsic motivation on the easier items of the 
PSI, which were the discriminating items in 
Year 1. These items primarily require only 
recall of basic information. 'The correlations 
of the Year 1 Brown with Year 2 PSI scores 
in this rural Head Start sample were some- 
what higher (rs to Year 2 PSI of .46 and .32 
for boys and girls, respectively, and rs of .47 
and 44 to Year 3 PSI), which suggests the 


increased importance of intrinsic motivation 
when the discriminating items required more 
than simple recall. 

To assess the extent to which performance 
on the Brown added to the prediction of 
performance on the third-grade cognitive 
measures made solely from the children's 
PSI scores, we ran multiple correlations, first 
entering the PSI then adding the Brown 
(except in the supplementary middle-SES 
“other preschool” sample where the ns were 
too small). In the urban sites, adding the 
Brown to the Year 1 PSI did not significantly 
increase the multiple correlation, but in the 
rural site the multiple correlation with 
third-grade math as the criterion signifi- 
cantly increased from .43 to .58 for boys and 
from .26 to .51 for girls. In Year 2 with the 
higher concurrent correlations of the Brown 
and PSI, the Brown contributed little to 
predictions of third-grade math perfor- 
mance, with the only significant increase for 
the sample of rural girls (R increased from 
AT to .53). The perceived school-success 
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scores were so skewed as to make any inter- 
pretation of correlational patterns highly 
questionable. 


Achievement Motivation 


Means and standard deviations for the 
achievement motivation scores are presented 
in Table 3. (Since Gumpgookies was first 
administered during the Head Start year, 
there are no Year 2 Gumpgookies scores in 
the rural site. Note also that the n was re- 
duced in the urban sites in Year 3 because 
Gumpgookies in that year was group ad- 
ministered only in target classrooms.) Al- 
though Adkins et al. (1972) reported results 
based on a total score plus four scores de- 
rived from a factor analysis of the items, at- 
tempts to replicate their factors, even after 
partialing for response bias, were unsuc- 
cessful. Further, alpha coefficients in the 
high .80s and low .90s for the total score 
suggested that subscores were unnecessary. 
Although Adkins et al. found a correlation of 
-84 between age (which ranged from 39 to 76 
months in their sample) and total score, in 
the current sample with its more restricted 
age range, the correlation of age with total 
score was only .17 in Year 2, .07 in Year a 
and .08 in Year 4. Hence, the conversion to 
age-normed Z scores described by Adkins et 
al. was not necessary in this sample. 

Consistent with previous findings (Adkins 
& Ballif, Note 1), the means during the pre- 
School and kindergarten years were relatively 
high and increased with age, although they 
did not approach the maximum possible 
score of 75. However, on the 60-item first- 
grade version, scores were approaching 
ceiling levels. Similarly, in first grade, re- 
Sponses to the item on self-reported school 
enjoyment were close to the maximum of 3.0. 
However, in first grade, teacher perceptions 
were not so uniformly positive as were the 
self-reports. Average teacher ratings of task 
orientation, while generally positive, were 
closer to the midpoint value of 15 than to the 
maximum possible score of 25. In both the 
urban and tural Head Start samples, girls 
were rated significantly higher in task ori- 
entation: in the urban Sites, t(151) = 2.69, p 
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< .01; in the rural site, t (156) = 3.69, p <.0: 
Previously reported findings on the same 
sample (Emmerich, 1971) indicated simila 
sex differences when trained observers rate 
children’s task orientation during free pla 
in urban Head Start classes. 

Since Year 3 Gumpgookies was group 
administered in target Head Start class 
rooms, no scores on it were available for the 
white middle-SES “other preschool” sample. 
In Year 4 there were no race/SES differences 
for rural girls, but for boys, first-grade 
Gumpgookies scores were significantly 
higher in the rural Head Start sample than 
in the rural “other preschool" sample, ¢(126) 
= 2.45, p <.05. This is in direct opposition to 
previous findings on the relation of SES to 
achievement motivation (e.g., Adkins et al 
1972). However, the Year 4 Gumpgookie 
scores in the “other preschool” sample did 
not correlate with teacher ratings of Year 
task orientation in first grade and were no 
predictive of later achievement; the scoi 
apparently has different meanings in the tw 
groups. 

Self-reported school enjoyment was highe 
for children in the Head Start sample, with 
the differences for boys reaching statistical 
significance, t(123) = 2.37, p <.05. On 
ratings of task orientation made by 
first-grade teachers, mean scores were sig: 


“other preschool" boys than for the blacl 
lower-SES boys in the rural Head Star 
sample, t(126) = 5.58, p <.01, although the 
difference for girls was not significant, t (10 
= 1.93, p >.05. Unlike the Head Start sam 
ple, in the “other preschool” sample ther 
was no significant sex difference in the tas 
orientation ratings. 
To correct for the moderate skew of th 
Gumpgookies scores, we normalized them 
prior to the correlational analyses reporte 
in Table 3. In the urban sites, Gumpgookies 
scores in Year 2 were predictive of third 
grade reading for boys and girls, and Raven 
scores for girls only. They were significantly 
correlated with concurrent PSI performan 
only for girls. By the kindergarten year in th 
urban sites (Year 3), Gumpgookies was n0 
longer predictive of any of the third-grad 
scores for boys, although for girls it contin 
ued to predict reading and Raven Scores, à 
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in addition was predictive of math perfor- 
mance. In the rural site, Year 3 Gumpgookies 
scores were significantly related to third- 

| grade performance in both reading and math 
and, for boys only, to Raven scores. Fur- 
thermore, the correlations with math scores 
were fairly substantial, accounting for 
3096-3596 of the variance. 

In the urban sites, neither of the first- 
grade self-report measures (Gumpgookies 
and school enjoyment) was significantly re- 
lated to the third-grade cognitive-perceptual 
scores. In the rural site, reported school en- 
joyment by boys was significantly related to 
the cognitive-perceptual scores, although 
the largest correlation accounted for less 
than 7% of the variance. First-grade 
Gumpgookies scores in the rural site were 
still significantly related to third-grade math 
performance for boys and girls, although to 
a significantly lesser extent than were the 
scores from Gumpgookies administered 
during the previous year. Gumpgookies 
scores were significantly related to reading 
scores only for boys and were not related to 
Raven scores for either boys or girls. Thus, 
for children similar to those in this sample, 
the age period 4-54) (i.e., prior to entry into 
first grade) appears to be a critical time for 
the administration of Gumpgookies since 
there is a notable drop in its predictive va- 
lidity the following year. 

In the urban sites, task orientation ratings 
were predictive of reading and math per- 
formance only for girls, but in the rural site 
these teacher ratings were predictive for both 
boys and girls. Low but statistically signifi- 
cant correlations with Raven scores were 
obtained for all groups except rural girls. In 
the urban sites, the statistically significant 
differences between boys and girls in the 
predictions of both reading (z =3.18, p <.01) 
and math (z = 2.43, p <.05) may reflect 
greater variability over time of achieve- 
ment-related behaviors for boys or it may 
reflect a greater difficulty on the part of 
first-grade teachers in identifying predictive 
achievement-related behaviors in urban 
boys. The lack of prediction for boys was 
apparently not caused by a lack of variability 
in the teacher ratings since the standard 
deviations of the ratings were almost iden- 
tical in the two sex groups. 
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In the urban sample, adding Year 2 
Gumpgookies scores to the PSI significantly 
improved predictions of reading scores only 
for boys (R increased from .29 to .43) and 
added nothing to predictions of math and 
Raven scores for either urban boys or urban 
girls. Gumpgookies scores in Year 3 added 
nothing for the urban sample, but for the 
rural sample they significantly improved 
predictions of third-grade reading for girls 
(R increased from .42 to .50) and Raven 
scores for boys (R increased from .27 to .43). 
Gumpgookies performance contributed most 
in the rural site for predictions of math 
scores. It added significantly to predictions 
for both boys and girls, accounting for an 
additional 1896—1996 of the math variance; for 
boys, R increased from .45 to .62, and for girls 
from .45 to .62. 


Discussion 


The high mean scores on both the Brown 
IDS Self-Concept Referents Test and the 
“school success” item suggest that in both 
middle- and low-SES samples self-esteem is 
quite high in the preschool years and 
through the first grade. By third grade more 
variation in self-esteem was noted, and these 
third-grade scores were more strongly re- 
lated to concurrent achievement measures. 
The current results are thus consistent with 
previous suggestions (Calsyn, 1973; Kifer, 
1975) that differences in academic self-es- 
teem develop as a reaction to school success 
and failure rather than act as a cause of such 
school performance. An implication for ed- 
ucation appears to be that teachers in the 
early elementary grades should be alert to 
the behaviors of themselves and others in the 
school setting which may decrease a child’s 
initially high level of self-esteem. 

The correlational analyses with the Brown 
indicated the predictive value of this mea- 
sure with children under age 5, although the 


„predictive variation may be related to dif- 


ferences in task motivation and under- 
standing rather than differences in self-es- 
teem. This illustrates the critical point that 
the predictive validation of a measuring in- 
strument must not be confused with estab- 
lishing the predictive validity of a theoretical 
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construct. The hypothesis that these scores 
are actually reflecting differences in atten- 
tiveness and task motivation could be as- 
sessed in future studies by including tester 
ratings of these behaviors obtained imme- 
diately after the child finished the task; 
scores of the inattentive children could then 
be treated separately in the analyses. How- 
ever, if inattentive children were truly low in 
self-esteem, this information would be 
lost. 

The drop in predictive validity beyond age 
5 indicates an apparent critical period for the 
Brown to assess whatever it is that it is 
measuring. It also demonstrates that mea- 
sures close in time need not correlate more 
highly than the same measures with a greater 
interval between assessments, especially at 
young ages when developmental changes in 
children and the way they respond to test 
items occur fairly rapidly. 

Gumpgookies scores, especially in Year 3 
in the rural site, were significantly related to 
the third-grade cognitive measures. Not all 
the predictive variation in Gumpgookies was 
already reflected in concurrent cognitive 
measures since it added significantly to 
predictions from the concurrent PSI. Thus, 
Gumpgookies apparently assesses achieve- 
ment-related attitudes that are important 
for later school achievement but are not yet 
totally reflected in concurrent achievement 
measures. Since natural variation in 
achievement motivation as defined by the 
Gumpgookies test (i.e., liking school activi- 
ties, feeling positive about one's self as a 
learner, expecting to succeed, persevering in 
attempts to succeed, and knowing mecha- 
nisms/tools that will enable one to succeed) 
appears to make a substantial independent 
contribution to predictions of academic 
achievement, preschool programs designed 
to develop these attitudes might make a 
substantial contribution to the child's later 
Success in school. 

Although there were a number of signifi- 
cant predictions from preschool Gumpgook- 
les scores, especially in the rural site, by 
first grade Gumpgookies scores were less 
predictive of later achievement. As with the 
um there is apparently a critical time for 
hs administration of Gumpgookies. Perhaps 

ren get older they are more likely to 
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take time to think of the socially desirab 
response, which results in scores near tj 
ceiling level that are necessarily less p 
dictive. If this hypothesis is correct, futu 
modifications of Gumpgookies to impro 
the apparent social desirability of the no 
achievement response might be useful (ej 
this Gumpgookie works at his/her own pap 
on his/her desk, and this Gumpgookie helf 
the teacher clean the blackboard). Thus 
child motivated to achieve social accep 
from the teacher could be distinguished 
the child motivated to achieve academic 
Another hypothesis for superior predictioy 
from the earlier scores is that Gumpgooki 
is more differentiating during the pe 
when achievement attitudes are in 
formative stage and children are first e 
posed to a major emphasis on school-9 
erted achievement; thus scores may re! 
also the child's readiness to assume si 
motivation. 

Self-reports by low-SES black children 
first grade indicated that stereotypes | 
these children as disliking school and beit 
disinterested in academic achievement dl 
incorrect; however, these positive attitudi 
were not reflected in their basic reading an 
math skills or, especially in boys, in asl 
oriented behaviors as perceived by the 
teachers. Thus, although developing positiv 
attitudes may be necessary for school sut 
cess, it is obviously not sufficient; teache 
also must provide adequate instruction € 
the appropriate task-related behaviors. Als 
the school environment must reinforce a 
sustain such interest and motivation. 

Further analyses are needed to explore 
number of specific issues. For example, 
apparently stronger association of achievi 
ment motivation to math performance th 
to reading or Raven performance needs to b 
explained. It may be that this relation 
found because math is less intrinsicalll 
motivating, instruction in math is less indi 
vidualized, and/or a greater complexity 0 
skills is involved in reading and the items 0 
the Raven, thus leading to more reliance 
general cognitive-perceptual abilities aní 
previously acquired skills; or it may bi 
something unique to the particulal 
achievement measures used in this study 
Since the reliance of the Brown on children’ 


PRESCHOOL PREDICTORS OF THIRD-GRADE ACHIEVEMENT 


ability to verbalize their conception of self 
emphasizes the cognitive components of 
self-esteem, different techniques for as- 
sessing self-esteem at the preschool level 
(e.g., observer ratings instead of a self-report 
measure) should also be explored. Although 
the focus of this report is on its relation to 
school achievement, that is obviously not the 
sole reason for studying self-esteem; the 
development of self-esteem in preschool 
children deserves further study whether or 
not it can be shown to predict later achieve- 
ment. To nurture it, however, we must be 
able to recognize its many manifestations. 
The variations in correlational patterns 
across the various subsamples also need to 
be explored, especially the general lack of 
significant correlations found for the urban 
Head Start males. The poor predictions 
cannot be ascribed solely to the sex, race, or 
SES of this group because of the fairly sub- 
stantial correlations found for the rural 
children of the same sex, race, and approxi- 
mately the same (or slightly lower) SES. In 
an attempt to unravel the complex sex and 
site variations, differences in the home and 
school environments of these children need 
to be more closely examined. For example, 
Follow Through programs were available 
only in the urban sites. If these programs 
were especially helpful for the least moti- 
vated boys, the lower correlations for that 
group would make sense and could even be 
seen as a positive program impact. This 
could be clarified by comparing children 
from the same site who were or were not ex- 
posed to Follow Through. Better yet would 
be the use of classroom observations that 
identified differential teacher reinforce- 
ments of achievement-related behaviors. 
Presently planned analyses will investigate 
the extent of differential classroom experi- 
ences according to the child's sex, race, an! 
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Effects of Prior Testlike Events and Meaningfulness of 
Information on Numeric and Comparative Reasoning 


Richard E. Mayer 
University of California, Santa Barbara 


One hundred sixty subjects read four passages that each described quantita- 
tive relations among elements in a four-term linear ordering (e.g., A = 2B, B 
= 10C, C = 5D) and answered either only 12 numeric (A = 20C?) or only 12 


comparative (A > C?) questions after each passage. On the fifth (target) pas- 


sage, on which all subjects received both 
subjects performed relatively better on the type of questions 
difference between the groups in ability to make in- 
information stated as meaningful stories resulted in better 


previously. There was no 


ferences. However, 


numeric and comparative questions, 
they had received 


performance on comparative questions and on inference, while information 
stated as nonsense equations resulted in the reverse trend. 


When a subject is presented with quanti- 
tative information, such as A = 5 + B, B = 
2+C,C = 4 + D, there may be several fac- 
tors influencing how the material is stored in 
memory and used to solve problems. Potts 
(1972, 1974) has shown that when subjects 
read about a linear ordering without quan- 
titative details (e.g, A > B, B > C, C > D), 
they performed just as well or better on an- 
swering questions that required inference (A 
> D?) as on retention (B > C?). However, 
Suppes and his associates (Loftus & Suppes, 
1972; Suppes, Loftus, & Jerman, 1969) have 
demonstrated the reverse pattern with re- 
spect to subjects’ answers to algebra story 
problems; difficulty of solution was posi- 
tively related to the number of computa- 
tional inference steps. These findings cor- 
respond to the intuitions of mathematics 
educators (Polya, 1968) that reasoning 
without specific quantitative information 
(e.g., estimating) is a qualitatively different 
process from reasoning with quantities (e.g., 
computations). 
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One factor influencing how quantitative 
information is stored by a learner may be 
whether he or she expects to use the infor- 
mation to make comparative judgments 
(such as Potts’s) or to make numeric judg- 
ments (such as Suppes’s). To encourage 
subjects to construct cognitive structures for 
comparative or numeric reasoning, subjects 
were given only one type of question after 
each of four passages and were tested on 
both types after a fifth (target) passage. This 
technique is similar to that used by 
McConkie, Rayner, and Wilson (1973) and 
by Mayer (1975) to investigate “forward 
transfer” of reading strategies resulting from 
prior test items and should help to extend 
those findings. 

One hypothesis is that subjects expecting 
only numeric questions build cognitive 
structures that include more detail (i.e., they 
store the actual numerical quantities as well 
as the direction of relations among elements) 
relative to subjects expecting only compar- 
ative questions. This idea would be consis- 
tent with a pattern of performance in which 
subjects with the numeric problem-solving 
set perform better on numeric questions on 
a target passage, while subjects with the 
comparative set perform better on compar- 
ative questions. 

Another hypothesis is that cognitive 
structures which support numeric reasoning 
differ qualitatively as well as in the amount 
from structures acquired to support com- 
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parative reasoning—for example, numeric 
structures could retain the original presen- 
tation organization such as being encoded as 
three propositions, while comparative 
structures could involve more integration 
such as a spatial list. This idea would be 
supported by a pattern in which subjects 
with numeric structures excelled on ques- 
tions involving retention of presented 
propositions, while subjects with compara- 
tive structures excelled on making infer- 
ences, 

A second factor which may influence how 
quantitative information is stored and used 
is the context or meaningfulness of the ma- 
terial. Previous experiments concerned with 
solving simultaneous algebraic equations 
(Mayer & Greeno, 1975) have suggested that 
information presented in a meaningful story 
is easier to make inferences from than in- 
formation presented as nonsense equations. 
One explanation is that meaningful infor- 
mation is assimilated to the subject’s past 
experience and hence restructured and in- 
tegrated, while nonsense information is 
added to memory and retained in the form 
presented. 

The present experiment attempted to 
compare the effects of these two input pre- 
sentation formats on numeric and compar- 
ative reasoning and to determine whether 
the two formats result in the acquisition of 
different types of cognitive structure. One 
hypothesis is that equations result in the 
acquisition of more quantitative detail rel- 
ative to story presentation, as would be 
supported by a pattern in which subjects 
given equations excel on numeric questions 
and subjects given a story excel on compar- 
ative questions. Another hypothesis is that 
subjects given equations acquire structures 
which more closely retain the presentation 
organization, while story format results in 
more integrated, assimilated structures. This 
hypothesis would be supported by a pattern 
of performance in which subjects given 
equations excel on retention questions and 


subjects given meaningful story format excel 
on inference questions. 


Experiment 1 


Experiment 1 investigated the effects of 
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problem-solving set (numeric vs. compa 
tive) and meaningfulness of presentation 9 
the cognitive structure formed for a fou 
term quantitative linear ordering. 


Method 


Subjects and design. Subjects were 80 In 

University students who participated in the experi 
to fulfill a requirement for their introductory psycho 
courses. Twenty subjects served in each cell of a 
factorial design, with the factors being whether lj 
meric or 12 comparative questions were asked aftery 
of the first four passages (numeric vs. comparal 
problem-solving set) and whether the passages dl 
questions were presented as meaningful stories ort 
equations (story vs. equation format). Since all sul 
read four set-inducing passages and answered the sa 
24 numeric and comparative questions after the fifi 
(i.e., target) passage, comparisons by type of questió 
for the target passage are within-subjects compar 
sons. 
Materials. Five four-term linear orderings (e 
> B > C > D) that incorporated quantitative relatio 
ships among the elements were constructed; each w 
expressed in two ways. Passages presented in story forn 
were given as meaningful paragraphs, while pas: 
presented in equation form expressed each sentence 
the story as an equation by assigning a letter to ead 
variable. All passages were presented in adjacent-lin 
organization, that is, the A-B, B-C, and C-D links wel 
given in order. For example, the target passage was 63 
pressed as shown below: 


Story: In a certain forest the animals are voting fi 
their leader. The frog gets twice as many votes ai 
the hawk. The hawk gets five times as many votes 
the rabbit. The rabbit gets four times as many vote 
as the bear. 


Equation: F 2 2X HL H -5x R,R - AX B. 


The four other passages were, respectively, e 
on children trading different colored marbles on 
playground, baseball players comparing their batting 
records, comparisons of the number of listeners amoni 
television stations in a town, and comparisons amon 
U.S. cities in amount of rainfall. 

For each of the five linear ordering passages, two seti 
of 24 questions, that could be answered “yes” or “no 
were constructed. One set expressed the questions in 
terms of words from a meaningful story and one 
expressed the questions in terms of letters from 
equations. The questions were based on a 2 X 2 X 2! 


relation between two variables (numeric vs. compara 
tive); (b) whether the correct answer was “yes” or “ 

(positive vs. negative answer); (c) whether the questi 
asked about adjacent links such as A-B, B-C, or C: 
or about remote links such as A-C, B-D, or A-D (ad 
cent vs. remote type); and (d) which particular link wä 
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involved. Examples of story questions for the target 
passage are as follows: 


Numeric-Positive-Adjacent (A-B): Does the frog get 
twice as many votes as the hawk? 

Numeric-Negative-Remote (A-C): Does the frog get 
20 times as many votes as the rabbit? 

Comparative-Positive-Remote (B-D): Does the 
hawk get more votes than the bear? 

Comparative-Negative-Adjacent (C-D): Does the 
bear get more votes than the rabbit? 


Story questions all were in the form "[number] as 
many as” or “more than" rather than "[fraction] as 
many as” or “less than”; all quantities used in nu- 
meric-negative questions were numbers that were 
correct for one of the numeric-positive questions. The 
equation set was identical to the story set except that 
questions were stated as equations, for example, F = 20 
X R (Numeric-Negative-Remote; A-C) or H > B 
(Comparative-Positive-Remote; B-D). 

Procedure. Subjects were tested in groups of up to 
eight per session with subjects randomly assigned to 
treatments. Each subject was seated in an individual 
booth in front of a cathode-ray tube screen (40 X 8 
character display) and a two-button response box that 
were controlled by an IBM 1800 computer. Following 
a brief verbal introduction, instructions for the exper- 
iment were presented to subjects individually via their 
screens. Subjects were told to press a response box 
button to see the first passage, to press the same button 
when finished reading, and to answer the subsequent 
questions by pressing the response box button labeled 
“YES” or “NO.” As soon as the subject pressed a button 
in response to a question, the question disappeared and 
a new question appeared on the screen. When the 
subject finished a set of questions, the following message 
appeared on the screen: “You were —% correct on this 
set of questions. Press either button when you are ready 
for the next passage." 

For each of the first four passages, each subject re- 
ceived either 12 numeric questions (numeric set) or 12 
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comparative questions (comparative set), but after the 
fifth (target) passage all subjects received all 24 items 
including 12 numeric and 12 comparative items. 
Subjects in the story and equation groups received ap- 
propriate passages and questions. All passages con- 
tained quantitative relations. 


Results and Discussion 


The response time and whether the an- 
swer was correct were recorded by the com- 
puter for each question answered by each 
subject. Because high error rates compli- 
cated the interpretability of the response 
time data, they were not treated; however, 
several analyses of variance were performed 
on various partitions of the proportion cor- 
rect data. 

Table 1 shows the proportion correct re- 
sponse in the four set-including passages for 
the four treatment groups (i.e., a between- 
subjects comparison) by question type and 
answer; an analysis of variance was per- 
formed on these data. There was the ex- 
pected learning effect, reflected in a steady 
improvement of .75, .85, .88, .89 proportion 
correct from Passage 1 to 4, MS = 1.50; F(3, 
228) = 17.14, p € .001; however, since there 
were no reliable interactions involving the 
passage variable, the data in Table 1 have 
been summed over all four passages. 

Subjects answering only comparative 
questions performed better overall than 
subjects who answered matched numeric 
questions, MS = 23.12; F(1, 76) = 38.30, p € 
.001. In addition, there were reliably differ- 


Table 1 
Proportion Correct Response on Set-Inducing Passages for Four Groups by Type and Answer of 
Question in Experiment 1 
Type and answer 
of question Type of Answer of 
Treatment Positive Negative question question 
(Set-Format) A R A R A R Positive Negative 
Comparative- 
Story 94 94 93 95 93 94 94 94 
Numeric- 
Story 87 71 87 85 87 8 19 86 
Comparative- 
N Fausto 91 .92 88 89 90 91 92 89 
umeric- 
Equation .82 EG 74 78 -78 62 .65 416 


Note. A = adjacent; R = remote. The MS for resi 
1 and 4, comparative set subjects received only 
Main effect of set, p < .001; main effect of format, 
Set X Type X Answer interaction, 
X Answer interaction, p < .10. 


idual error is .20; each 
comparative questions; ri 
p € 405; Set X Type interaction, 
p < 001; Format X Type interaction, ns; Format X 


cell is based on 12 observations of 20 subjects. For Tables 
numeric set subjects received only numeric questions. 
p <.001; Set X Answer interaction, p € .001; 
‘Answer interaction, ns; Format X Type 
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Table 2 

Proportion Correct Response on Target 
Passage for Four Groups by Quantification of 
Question (All Questions) in Experiment 1 


Treatment Quantification of question 
(Set-Format) Comparative Numeric 
Comparative— 

Story 98 61 
Numeric-Story 93, .80 
Comparative- 

Equation 94 60 
Numeric- 

Equation Jl 15 


Note. The MS for residual error is .28; each cell is based on 12 
observations of 20 subjects, Set X Quantification interaction, 
p € .001; Format X Quantification interaction, p < .05. 


ent patterns of performance in which 
subjects answering comparative questions 
performed better or the same on questions 
requiring inference (i.e., remote type) rela- 
tive to questions involving retention (i.e., 
adjacent type), while the subjects answering 
only numeric questions showed the reverse 
pattern, Set X Type of Question interaction: 
MS = 4.27; F(1, 76) = 31.86, p < .001. An- 
other difference in the pattern of perfor- 
mance was that numeric questions tended to 
elicit more *no" responses, especially on 
questions involving inference, relative to 
comparative questions, Set X Answer of 
Question interaction: MS = 2.92; F(1, 76) = 
12.72, p < .001; Set X Type X Answer of 
Question interaction: MS = 3.75; F(1, 76) = 
18.75, p € .001. 

Apparently, in quantitative reasoning, 
making correct inferences from the given 
data is much more difficult than in com- 
parative reasoning, probably due to the ad- 
ditional computational steps required for 
each inference. These results allow the 
elimination of one model which proposes 
that comparative problems require—like 
numeric problems—that the subject calcu- 
late the exact numerical relationship and 
then determine the more general question of 
which is greater. These results are consistent 
with the intuitions of mathematics instruc- 
tors that comparative judgments of relations 
may require an entirely different reasoning 
process than actual calculations, that is, that 
comparative and numeric reasoning in hu- 
mans (a) are two distinctly diff = 
cesses or (b) involve di M Dro 

: ve different levels of pro- 
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cessing. In addition, providing informatioi 
in story form resulted in better overall pe 
formance than equation form, MS = 5.86 
F(1, 76) = 9.61, p < .005. However, there wa 
no reliable Format X Type of Question, Mf 
= .21; F(1, 76) = 1.53, ns, no reliable Forma 
X Answer of Question (MS = .02, F < 1), ant 
only a marginally reliable Format X Type X 
Answer of Question interaction, MS = .70 
F(1, 76) = 3.50, p < .10, and hence there is; 
lack of support for the idea that presentation 
format influenced the type of. reasoning 
process. 

Table 2 shows the proportion correct 
sponse on all 24 numeric and comparative 
questions in the target passage (a within 
subjects comparison) for each of the foul 
treatment groups. An analysis of variand 
revealed that subjects who had solved onl} 
numeric questions on the first four passaget 
excelled on the 12 numeric questions ant 
performed worse on the comparative que! 
tions relative to the comparative set subjects 
Set X Quantification of Question interae 
tion, MS = 11.72; F(1, 76) = 41.60, p < .001 
These results suggest an overall differen 
in the amount of information stored bi 
subjects with different problem-solvin 
sets—numeric set subjects apparently en 
coded the material in a way that retainel 
more detail concerning the quantities in 
volved. That numeric set subjects performe 
poorer on comparative questions relative fi 
the comparative set subjects is consisten 
with the comparative questions requiring à 
inference or translation step (e.g., from “fiv 
times as many" to *more than" for nume 
set subjects). 

In addition, subjects who read a meant 
ingful story performed much better ol 
comparative questions than on numeri 
questions, while the relative difference wa! 
much smaller for subjects who had nonsensi 
equations, Format X Quantification 
Question, MS = 1.01; F(1, 76) = 4.00, p * 
.05. Apparently, equation format is mot 
suitable for numeric calculation, whil 
meaningful information provides for muc 
better estimating or comparative reasoning 
It is particularly significant that presenta 
tion format did seem to have an effect on thi 
way information is structured in memory, âl 
least on how much of the originally pre 


re 
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E 
= sented detail is retained. Earlier work 
E .. (Mayer & Greeno, 1975) has suggested that 
equation format may result in *adding" in- 
formation to memory in a way that retains 
the original organization, while meaningful 
format allows subjects to “assimilate” the 
information and connect it with other sys- 
tems of knowledge—a process that results in 
loss of some of the original detail. 
Table 3 summarizes the proportion correct 
on only the 12 comparative questions that 
^ were answered by the four groups after the 
fifth (target) passage. The comparative set 
subjects performed better overall than the 
~ numeric set subjects, MS = 4.96; F(1, 76) = 

12.32, p < .001, but all groups displayed a 

tendency consistent with Potts (1972) to 

perform just as well or better on inference 

questions as on retention questions (Set X 

Type interaction, MS = .01, F < 1). That 

numeric set subjects were more likely than 

comparative set subjects to respond “no,” 

Set X Answer interaction, MS = .13; F(1, 76) 

= 6.30, p < .025, suggests that prior prob- 
- lem-solving experience may influence re- 
- sponse criteria, but there is no evidence of a 
difference in the organization of stored 
knowledge due to problem-solving set. 

In addition, there was no evidence of a 
pattern in which one presentation format led 
to better performance on retention and an- 

ther type of format led to better perfor- 
mance on inference (Format X Type inter- 
action, MS = .00, F < 1). Mayer and Greeno 
(1975) did obtain such an interaction and 
hence, evidence for different organizations 


Table 3 
Proportion Correct Response on Target Passage for Four Groups by Type and Answer of 
Question (Comparative Questions Only) in Experiment 1 
Type and answer 
I of question Type of Answer of 
"Treatment Positive Negative question question 
(Set-Format) A R A R A R Positive Negative 
Comparative- 
Story 93 1.00 1.00 .93 97 99 97 99 
Numeric- 
Story 2 .90 .93 93 93 92 .93 92 93 
Comparative- 
Equation 95 95 93 93 94 94 95 93. 
Numeric- 
Equation 3 78 .65 67 .69 NE) 16 66 


Note. A = adjacent; R = remote. The MS for residual error is .04; each cell is based on 3 observations of 20 subjects. Main effect 
of set, p <.001; Set X Type interaction, ns; Set X Answer interaction, p < .025; Format X Type interaction, ns; Format X Answer 
interaction, p < .001; Set X Format interaction, p < .05; Set X Format X "Type interaction, ns; Set X Format X Answer interaction, 


of memory; however, the materials were 
more complex and lengthy. Finally, there 
was a tendency for subjects given meaningful 
stories to respond “no” more often relative 
to the equation format groups, Format X 
Answer interaction, MS = .38; F(1, 76) = 
18.80, p < .001, and hence, some evidence 
that presentation format may influence re- 
sponse criteria. 

The two-way interaction between prob- 
lem-solving set and meaningfulness of for- 
mat, MS = 1.93; F(1, 76) = 4.76, p < .05, re- 
flects a pattern in which comparative set has 
about the same overall effect on both pre- 
sentation formats but numeric set has a 
detrimental effect on equation format rela- 
tive to story format. Apparently, either 
comparative problem-solving set or mean- 
ingful presentation format results in a new 
cognitive structure with less distracting de- 
tail (i.e., less numbers). The numeric/equa- 
tion subjects were, however, encouraged to 
retain exact information—resulting in a 
structure with extraneous information that 
could interfere with the acquisition or use 
during problem solving of information re- 
quired for nonquantitative solutions. As with 
the above analysis, there were no reliable 
differences among the four treatment groups 
in answering adjacent versus remote ques- 
tions (Set X Format X Type interaction, MS 
= |03, F < 1) but there was a marginally re- 
liable three-way interaction among set, for- 
mat, and answer, MS = .08; F(1, 76) = 3.88, 
p € .06, in which the numeric/equation 
group was far more likely to respond “yes” 
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than the other groups. These results are 
consistent with the idea that if subjects re- 
ceive prior problem-solving practice only 
with calculating quantitative answers and 
are presented information in equation rather 
than meaningful form, then their encoding 
of the information is likely to be more rigid 
and detailed and their decision criteria 
during problem solving will favor “yes” for 
comparative questions. 


Experiment 2 


Experiment 2 involved a replication of 
Experiment 1 but used a more complex way 
of presenting the information, that is, pre- 
senting remote pairs only, such that atten- 
tion to quantitative relations was needed 
even to make nonquantitative inferences. 


Method 


Subjects and design. Subjects were 80 Indiana 
University students drawn from the same pool as in 
Experiment 1; the design was identical to that in Ex- 

_ periment 1. 

Materials. The five passages were based on the same 
linear orderings as in Experiment 1, except that each 
passage presented the three remote pairs (remote-link 
organization), that is, A-C, B-D, and A-D instead of the 
adjacent links. For example, the target passage was 
expressed as follows: 


Story: In a certain forest the animals are voting for 
their leader. The frog gets ten times as many votes 
as the rabbit. The hawk gets twenty times as many 
votes as the bear. The frog gets forty times as many 
votes as the bear. 


Equation: F = 10 X R, H = 20 X B, F = 40 X B. 


The questions were identical to those used in Experi- 
ment 1. 

Y Procedure. The procedure was identical to that used 
in Experiment 1. 


` Results 


The results were collected, scored, and 
analyzed as in Experiment 1. Table 4 shows 
the proportion correct response for the four 
treatment groups by question type and an- 
Swer and is comparable with Table 1. There 
was an overall improvement from .68 to .74 
to .75 to .80 proportion correct across pas- 
sages, MS = 19.27; F(3, 228) = 8.83, p < .001, 
but since there were no reliable interactions 


involving passage, data were summed 
this variable. As in Experiment 1, subj 
solving only comparative questions pe 
formed well on making inferences (typea 
jacent) but worse on retention questi 
(type remote) relative to subjects solvii 
only numeric problems, Set X Type int 
action, MS = 9.20; F(1, 76) = 34.07, p «001 
There was also, as in Experiment 1, a st 
tendency to say “no” with numeric qi 
tions, Set X Answer interaction, MS = 90. 
F(1, 76) = 121.71, p < .001, and this te 
dency was particularly strong for num 
questions involving inference, Set X 
Answer interaction, MS = 32.63; F(1, 
181.27, p < .001. These results are coi 
with the conclusions drawn earlier that nt 
meric inferences and comparative infe 
rely on two different processes or leve 
processing. One interesting difference in 
method of Experiment 2 is that subj 
must attend to the quantitative relations 
even to make comparative inferences; 
ever, even under these conditions there is) 
evidence that comparative subjects calc 
lated the exact numerical relationships f 
adjacent (inferred) pairs and then noti 
which was larger. As in Experiment 1, the 
was an overall superiority for subj 
working with information presented wi 
the context of a meaningful story OV 
subjects working with the same informati 
presented as equations, MS = 13.30; F(4 
= 19.77, p € .001, but there was no differe 
in the pattern of inference ability (Form 
Type interaction, MS = .04, F < 1). TI 
in Experiment 1, there is no evidence. 
presentation format influenced the t; 
reasoning process. K 
In comparing the pattern of performa 
of the groups in Experiment 1 with pi 
mance in Experiment 2, it is clear that 
sentation organization (i.e., presenting 
jacent pairs vs. remote pairs) had a 
effect on the pattern of problem-$ 
performance; for example, for each t 


mance on remote and adjacent item 
greater (or in the reverse direction) Wi 
mote-link organization relative to 
cent-link organization, Experiment X 
interaction, MS = 6.00; F(5, 760) = 40 
< .005. This result suggests that the comp" 


EFFECTS OF PRIOR TESTLIKE EVENTS 


Table 4 
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Proportion Correct Response on Set-Inducing Passages for Four Groups by Type and Answer 


of Question in Experiment 2 


a YS TIE 


Type and answer 


zt of question Type of Answer of 

Treatment Positive Negative question question 
(Set-Format) A R A R A R Positive Negative 
Comparative- 

Story -18 92 19. .93 79 .92 85 .86 
Numeric- 

Story -40 -90 .85 .85 .63 .87 .65 .86 
Comparative- 

Equation 69 -70 64 NAI 67 -70 .69 .68 
Numeric- 

Equation — 11 —.89 94  .80 .,.52  .85 50 87 


Note. A = adjacent; R = remote. The MS for residual error is .18; each cell is based on 12 observations of 20 subjects. Main effect 
of set, p € .001; main effect of format, p < .001; Set X Type interaction, p < .001; Set X Answer interaction, p < .001; Set X Type 


X Answer interaction, p « .001; Format X Type interaction, ns. 


information presented in Experiment 2 is 
not assimilated or integrated to form the 
same cognitive structure as in Experiment 
1. That the effect of presentation organiza- 
tion was not as strong for the comparative 
questions, Experiment X Set X Type inter- 
action, MS = 2.92; F(5, 760) = 19.44, p < 
.001, suggests that numeric reasoning de- 
mands a more exact encoding of the infor- 
mation as presented, while comparative 
reasoning allows subjects to “recode” or 
assimilate the information. 

Table 5 shows, like Table 2, the proportion 
correct response on the numeric and com- 
parative questions in the target passage for 
the four treatment groups. As in Experiment 
1, an analysis of variance revealed that 
subjects who had solved only numeric 
problems on the first four passages excelled 
on the 12 numeric questions and performed 


Table 5 

Proportion Correct Response on Target 
Passage for Four Groups by Quantification of 
Question (All Questions) in Experiment 2 


Treatment Quantification of question 
|... (Set-Format) Comparative Numeric 
Comparative— 
| Story 90 70 
Numeric-Story .87 18 
Comparative— 
Equation -19 .69 
Numeric- 
Equation .64 72 


Note, The MS for residual error is .20; each cell is based on 12 
observations of 20 subjects. Set X Quantification interaction, 
p <.001; Format X Quantification interaction, p < .005. 


worse on the 12 comparative questions rel- 
ative to the comparative set subjects, Set X 
Quantification of Question interaction, MS 
= 2.55; F(1, 76) = 12.77, p < .001. Even 
though comparative set subjects would be 
expected to pay more attention to the 
quantities in Experiment 2, there is evidence 
that the two groups acquired cognitive 
structures which differed in amount of detail 
encoded. 

Also as in Experiment 1, subjects who read 
meaningful stories performed much better 
on comparative questions than on numeric 
questions, while the trend was weaker or 
reversed for subjects working with nonsense 
equations, Format X Quantification of 
Question, MS = 1.88; F(1, 76) = 9.37, p < 
.005. Apparently, even in a very complex 
situation such as Experiment 2, equations 
can result in a more specific kind of encoding 
than meaningful context-related stories. 

An analysis of the pattern of performance 
of subjects on the 12 comparative problems 
shown in Table 6 revealed that both numeric 
and comparative set groups performed about 
an average of 15 percentage points less cor- 
rect on adjacent type (i.e., requiring infer- 
ence) than on remote type problems (i.e., 
requiring retention). That the Set X Type 
interaction failed to reach statistical signif- 
icance (MS = .05, F < 1) extends the results 
of Experiment 1 and strengthens the con- 
clusions that subjects expecting to use dif- 
ferent processes of problem solving struc- 
tured the information in the same way. That 
the comparative set subjects performed 
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Table 6 


Proportion Correct Response on Target Passages for Four Groups by Type and Answer of 
Question (Comparative Questions Only) in Experiment 2 


‘Type and answer 
of question Type of Answer of 
Positive Negative question question _ 
ee A R A R A R Positive Negative 
Comparative- 
Story .80 1.00 18 1.00 -19 1.00 90 89 
Numeric- 3 jj 
Story -73 98 TI 97 NI .98 .86 87 
Comparative- 23 ^ 
Equation .68 78 A18 92 73 85 .73 .85 
Numeric- 1 T 
Equation .55 .60 68 12 .62 66 .58 70 
Note, A = adjacent; R = remote. The MS for residual error is .06; each cell is based on 3 observations of 20 subjects Main effect 
of set, p < .005; main effect of Format, p < .001; Set X Type interaction, ns; Set X Answer interaction, ns; Format X Type inter- 


action, p < .005; Format X Answer interaction, p < .10; Set X Format interaction, p < .05. 


better overall than the numeric set, MS = 
2.11; F(1, 76) = 10.66, p « .005, is consistent 
with the results of Experiment 1 that the 
comparative set subjects encoded material 
in less detail and hence in a more usable 
form for comparative problems. Unlike Ex- 
periment 1, there was no evidence of a dif- 
ference in the tendency to say *no" between 
numeric and comparative set subjects (Set 
X Answer interaction, MS = .01, F < 1; Set 
X Answer X Type interaction, MS = .05, F 
<1). 
An analysis of the data presented in Table 
] 6 further suggested that, unlike Experiment 
1, there were strong effects due to presen- 
tation format; these differences support the 
idea that complex information presented as 
a meaningful story is not stored in the same 
way as information presented as nonsense 
equations. Subjects who learned the infor- 
mation as equations showed only an 8% dif- 
ference between adjacent and remote type 
questions, while for subjects working with 
stories the difference was 22 percentage 
points, Format X Type interaction: MS = 
118; F(1, 76) = 10.28, p < .005. That 
subjects working with equations could make 
inferences almost as well as answering 
questions on retention is consistent with the 
idea that the equation encouraged the re- 
tention of specific numbers in the presented 
information; in Experiment 2, such attention 
to the exact numbers and equations was re- 
quired to answer inference questions. In 
addition, subjects using equations showed a 
higher probability of saying “no” than the 


story format subjects, Format X Answer in: 
teraction, MS = .88; F(1, 76) = 3.30, p < .10. 
Apparently, answering questions about 
complex material and with no feeling of un 
derstanding results in a higher criterion fo 
recognition. 

There was also a pattern of Set X Formai 
interaction, MS = .88; F(1, 76) = 4.21, p < 
-05, similar to that obtained in Experimen 
1 in which the numeric/equation group 
performed much worse than other groups 
and the comparative/story performed best; 
One hypothesis that is consistent with this 
finding is that the numeric/equation subjects 
attempt to encode in exact detail and the 
overload of information distracts from per 
formance on comparative problems; on the 
other hand, exposure to meaningful format 
or to a comparative set encourages a less 
detailed, more integrated encoding of in 
formation. 

A comparison of the data summarized in 
Tables 3 and 6 indicates several interesting 
differences between subjects in Experiments 
1 and 2. As would be expected, the more 
difficult and confusing presentation orga: 
nization given in Experiment 2 resulted in a 
higher tendency to answer “no,” Experiment 
X Answer interaction, MS = .75; F(1, 152) 1 


higher recognition criterion are used if the 
same basic information is presented in a 
more confusing way. Although the added 
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complexity of the organization style of Ex- 
periment 2 decreased performance overall, 
as might be expected, MS = 4.03; F(1, 152) 
= 12.68, p < .001, there was also evidence 
that presentation organization influenced 
the quality of cognitive structuring as well. 
That the two experiments yielded different 
patterns of performance on adjacent and 
remote type problems, Experiment X Type 
interaction, MS = 2.00; F(1, 152) = 13.74, p 
< .001, again suggests that material is not 
totally assimilated to the same general cog- 
nitive structure in both experiments. 


General Discussion 


Since many tests were conducted in the 
two experiments, I will summarize only the 
main findings here. First, in both experi- 
ments there is clear and consistent support 
for the idea that making inferences with 

> quantitative (numeric) information and with 
nonquantitative (comparative) information 
are distinctly different processes rather than 
extensions of one another. This claim is 
based on careful analysis of pattern of per- 
formance shown in Tables 1 and 4. 

A main focus of the present experiments 
was to determine whether the same cognitive 
structures support these two different rea- 
soning processes. The two experiments were 
consistent in providing support for the hy- 
pothesis that subjects expecting to answer 
numeric questions encoded in more detail 
(especially, more numbers) than did subjects 
expecting comparative problems, for exam- 
ple, as suggested by comparing the pattern 
of performance of the two sets in Tables 2 
and 5. However, there was no support for the 
hypothesis that different problem-solving 
sets lead to acquisition of qualitatively dif- 
ferent structures, for example, as indicated 
in Tables 3 and 6. In Experiment 1, for ex- 
ample, both problem-solving sets produced 
an effect similar to Potts's (1972) in which 
inference questions were easier or no more 
difficult than retention questions. One rep- 

__ resentation of the acquired information that 
is consistent with our results is that all 
subjects acquired a spatially arranged linear 
image or list, but that numeric set subjects 
included tags indicating the quantitative 
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distance. In addition, although the same 
general cognitive structure may support both 
types of reasoning, there is some evidence 
that the recognition criterion (i.e., the ten- 
dency to say “no”) is different for numeric 
and comparative problem-solving sets. 

There is also interesting evidence re- 
garding Mayer and Greeno’s (1975) idea that 
equations are more likely to be stored in a 
way that retains the original presentation 
detail, while meaningful stories are assimi- 
lated and integrated. In the present experi- 
ments, there was some support for the hy- 
pothesis that equation format encouraged 
more amount of detail in encoding. For ex- 
ample as shown in Tables 2 and 5, the 
equation subjects, especially with numeric 
set, tended to perform better (a) on numeric 
problems than on comparative problems and 
(b) on problems requiring retention of exact 
quantities. There was also some evidence for 
the hypothesis that the two groups stored 
the information in qualitatively different 
ways; for example, in a complex situation 
such as Experiment 2 (see Table 6), the 
pattern of inference performance is in- 
fluenced by format. Again, there was evi- 
dence that equations and stories not only 
influence how material is encoded but also 
that there are differences in the tendency to 
say “no” as well. 

Finally, there was evidence that presen- 
tation organization (e.g., simple linear in 
Experiment 1 vs. complex in Experiment 2) 
can also influence the way information is 
structured in memory. One interpretation is 
that the complex organization used in Ex- 
periment 2 was more likely to be retained as 
presented and a simpler organization used 
in Experiment 1 was more likely to be inte- 
grated. These results suggest that the Potts 
(1972) effect may be limited to nonquanti- 
tative (comparative), nontechnical, simple 
material. In mathematics instruction, these 
results suggest that past experience with 
solving only calculation problems, especially 
if presented in low-meaning contexts such as 


1 A detailed analysis of differences in the memory 
structures and solution algorithms of story and equation 
groups is beyond the scope of this article. However, this 
issue is explored, using a different data analysis tech- 
nique and including new data, in Mayer (1978). 
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equations, results in a rigid, specific encoding 
strategy and poorer transfer to more inter- 
pretive problems. 
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Causal Influence of Teachers’ Expectations 
on Children's Academic Performance: 
A Cross-Lagged Panel Analysis 


William D. Crano and Phyllis M. Mellon 
Michigan State University 


This experiment investigated the causal interplay of teachers' expectations 


and children's academic performance. 


In a 4-year longitudinal study of 4,300 


British beginning elementary school children, a series of cross-lagged panel 
correlational analyses indicated that the preponderant cause in the achieve- 
ment-expectancy relationship was that of teachers' expectations causing chil- 
dren's achievements to an extent appreciably exceeding that to which chil- 
dren's performance impinged on teachers’ attitudes. Teachers’ evaluations of 
children’s social performance affected later achievement to an extent exceed- 
ing that attributable to academic expectations. The methodological and sub- 
stantive implications of these findings are discussed. 


Whether an individual's expectations 
concerning another's behavior can actually 
influence (or cause) that behavior is an issue 
that has stimulated considerable interest in 
psychology. Of all the variations on this 
theme, the research of Rosenthal and Ja- 
cobson (1968), concerned with the effects of 
teachers’ expectations on the intellectual 
growth of elementary school children, has 
produced perhaps the greatest controversy. 
In their research, these investigators tested 
all students in Grades 1 to 6 of “Oak School" 
with Flanagan's (1960) group-administered 
intelligence test at the beginning of the ac- 
ademic year. This measure was described to 
teachers as an instrument by which intel- 
lectually late-blooming students could be 
identified. Within each classroom, six stu- 
dents were assigned randomly (or semiran- 
domly, see Elashoff & Snow, 1971) to the 
late-bloomer condition, with the remainder 
of the class serving as controls. The teachers 
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were told that the experimental subjects 
could be expected to demonstrate remark- 
able intellectual improvement over the 
course of the school year. At year's end, all 
students were retested; the results indicated 
significant differences in gain scores in the 
first two grades favoring the experimental 
subjects over the controls. Objections con- 
cerning the inapplicability of Flanagan's test 
in the lower grades (Thorndike, 1968) and 
the noncritical use of simple change scores 
(e.g., see Cronbach & Furby, 1970) in deter- 
mining the validity of the teacher expectancy 
hypothesis prompted a reanalysis of the 
data, with the resulting findings indicating 
no apparent influence of the experimental 
treatment (Elashoff & Snow, 1971). 

The controversy surrounding the appro- 
priate interpretation of the Rosenthal and 
Jacobson (1968) and Elashoff and Snow 
(1971) analyses was reflected in a number of 
attempts at replication which, in general, 
failed to provide unambiguous support for 
either position. On the whole, the studies 
purporting to replicate the original expec- 
tancy findings can be criticized as being 
characterized (a) by interactions when main 
effects were predicted (i.e., Conn, Edwards, 
Rosenthal, & Crowne, 1968); (b) by the linear 
combination of unrelated achievement 
subscales (e.g., Zanna, Sheras, Cooper, & 
Shaw, 1975); (c) by the noncritical use of 
simple gain scores despite a vast psycho- 
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metric literature indicating the dangers of 
such procedures (e.g., Meichenbaum, Bow- 
ers, & Ross, 1969); and finally (d) by the 
unorthodox application of orthodox statis- 
tical techniques (e.g., Evans & Rosenthal, 
1969). 

On the other hand the studies which failed 
to replicate the Rosenthal findings, while 
usually characterized by a somewhat more 
refined grasp of statistics and research de- 
sign (Fleming & Anttonen, 1971; Mendels & 
Flanders, 1973; O’Connell, Dusek, & 
Wheeler, 1974), often appeared to bias the 
research operations against a positive result. 

In some experiments (e.g., José & Cody, 
1971), the impact of the manipulation of 
teachers’ expectations was assessed after a 
single academic semester had elapsed; it is 
conceivable that the effect of the manipu- 
lation would be apparent only after an entire 
year, as in the original research. Other ex- 
periments (Claiborn, 1969; Fielder, Cohen, 
& Feeney, 1971) initiated the expectancy 
induction after the beginning of the second 
semester. It is possible that teachers’ ex- 
pectations had already developed on the 
basis of children’s performance over the 
course of the first semester, however, and 
thus the experimental treatment was simply 
not strong enough to effect a change. These 
findings would not necessarily indicate that 
expectancies did not operate, but merely 
that the particular experimental treatments 
employed were not sufficient to counteract 
naturally occurring expectations. This pos- 
sibility appears especially plausible in light 
of the results of Dusek and O’Connell (1973), 
who found that naturally occurring teacher 
expectations were significantly correlated to 
students' academic accomplishments, while 
their attempted experimental manipulation 
of expectancies had no effect on later 
achievement. 

; Studies designed to avoid the potential 
limitations of induced teacher expectancies 
have made use of naturally occurring ex- 
pectations in the attempted determination 
of the strength of this effect. While the 
weight of evidence is far from overwhelming, 
x ae relationship between teachers’ 

atural expectations and chil d 
achievement has been reported in EN 
studies (Palardy, 1969; Seaver, 1973; Suth- 
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erland & Goldschmid, 1974). Unfortunai 
given the extreme selection biases that ¢ 
operate in research of this type, statem en 
of a causal nature are precluded. Consid 
for example, the procedures employed 
Sutherland and Goldschmid’s (1974) ine 
tigation. In their study, first-grade teache 
expectations of their pupils’ academic p 
tential were collected after the second mon 
of school. The Wechsler Intelligence Sc 
for Children and Lorge-Thorndike ini 
gence test were also administered to f) 
children then and again 5 months later. E 
data analytic purposes, change in test sco 
from first to second test administration y 
inspected as a function of teachers' expi 
tations. Despite being couched in an analy; 
of variance design, the necessary nonrando 
self-selection of subjects to conditions re 
ders this study correlational; thus, even if 
positive linear relationship between expe 
tations and IQ change were observed, tl 
most that could have been said of such a 
sult was that there existed a relationsh 
between these two variables. Statemen 
concerning the causal dependence of one 
these variables upon the other, howevé 
would not have been justified. l 

The Sutherland and Goldschmid (19% 
study emphasizes the fundamental diffic 
of research in this area. For any of anumb 
of possibilities it seems apparent that € 
perimentally induced teacher expectatiot 
do not impinge strongly upon children's al 
ademic achievement. Naturally occurrit 
teachers’ attitudes might well affect ch 
dren, however, and, from a pragmatic poii 
of view, such expectations are by far 
more interesting (see Finn, 1972). Unforti 
nately, in employing such expectancies. 
assigning subjects to experimental cond 
tions, one limits the outcome of the resear 
to, at best, a statement of the obvious. V 
know that teachers' assessments and chi 
dren's achievements are related. It would } 
a very poor teacher who, on the basis of pal 
training and experience, could not diffél 
entiate children in terms of academic B 
tential. q 

The critical issue, however, is that ¢ 
causation: Do children’s characteristics all 
actions cause teachers’ expectations or d 
teachers’ expectations cause children’s aca 


emic achievement? The first alternative is 
ertainly plausible, and one would be hard 
ut to criticize teachers for forming different 
impressions of children as a function of dif- 
ferences in children’s performances. The 
atter (Rosenthal & Jacobson, 1968) alter- 
ative, however, is also plausible, but given 
he apparent limitation of research in this 
rea to simple correlational techniques, no 
easonable choice between these two options 
an be made. 

In light of recent methodological devel- 
pments, however, dependence on such 
echniques is no longer necessary. With the 
introduction of the cross-lagged panel cor- 
relational method (Campbell, 1963; Camp- 
ell & Stanley, 1963; Pelz & Andrews, 1964), 
ausal inferences based on correlational data 
btained in longitudinal panel studies can be 

ade and enjoy the same logical status as 
hose derived in the more standard experi- 

ental settings. The logic of this method has 
een discussed elsewhere (e.g., Crano, 1974), 
s have some of its problems and limitations 
Rozelle & Campbell, 1969) and their possi- 
le solution (Crano, Kenny, & Campbell, 
972; Kenny, 1973). Briefly, this technique 
ecognizes time precedence as the basis of all 
ausal inference. If an event consistently 
recedes the occurrence of another and the 
pposite is not the case, then one of two sci- 
ntifically acceptable explanations is possi- 
le: One possibility is that the first event 
perated as a cause of the second; alterna- 
ively, both events might be the effects of 
me unspecified third factor. Temporal 
ontrol of the independent variable in ex- 
erimental situations effectively preludes 
his second alternative and forms the basis 
f experimental-causal inference. In simple 
orrelational designs, variables are free from 
mporal constraints and the power of causal 
nference is thus forfeit. But in a panel de- 
ign, that is, one employing the same subject 
ample and the same set of variables over a 
umber of administrations, time order can 
e brought to bear in the causal elucidation 
f correlational data. Consider a situation in 
hich students’ performance scores and 
eachers' expectations or evaluations of the 
tudents were both available at two time 
oints. The pattern of possible correlations 
ould take the form presented in Figure 1. 


CAUSAL INFLUENCE OF TEACHERS' EXPECTATIONS 


41 


Suppose for the sake of later exposition that 
teachers’ expectations (E) measured at Time 
1 were followed at Time 2 by a consistent 
pattern of children’s academic achievement 
(A), such that high status on E was always 
followed by high status on A. Suppose fur- 
ther that the opposite relationship did not 
hold, that is, that status on A at Time 1 held 
no implication for E at Time 2. In such a 
case, the cross-lagged correlation rga would 
exceed rA;r;, and such discontinuity, in the 
absence of the operation of an unspecified 
third (causal) variable, would suggest that 
teachers’ expectations served as a cause 
(though possibly only one of many) of chil- 
dren's later achievement. 

If the opposite held (i.e., if rag; > PEA)» 
we would be justified in concluding that 
children's achievement operated causally 
with respect to teachers' expectancies; the 
final possibility, rA;E; = rg;A» would indicate 
either the absence of a functional relation- 
ship between the two variables or the oper- 
ation of a more general causal influence 
which impinged equally upon both mea- 
sures. 

It should be made clear that these com- 
parisons provide information of a relativistic, 
rather than absolute, nature. It is quite 
conceivable, for example, that a child's 
achievement might affect later teacher ex- 
pectations, which in turn might impinge 
upon later performance. It seems equally 
probable, however, that one of these causal 
vectors is preponderant, and it is with the 
identification of the preponderant causal 
vector that this research is concerned. Stated 
briefly, this study seeks to test in a cross- 
lagged panel correlational design the prop- 
osition suggested by Rosenthal and Jacobson 


Pie r Exp, 

] t, t " 
r 
tn 

7 ee 3 

1^ ane 
fay Ez 

Ach Tay Ag Ache 


Figure 1. Schematic representation of a cross-lagged 
panel correlational design involving teachers’ expec- 
tations (E) and children’s achievement (A). (Exp = 
expectancy; Ach = achievement.) 
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(1968) that rz,. > rar» and to contrast this 
hypothesis with the alternative possibilities 
of no functional relationship (rag, = rg;A;) 
or one in which achievement operates as the 
preponderant cause of teacher expectations 
(TAIE? > l'EjA;).! 


Method 


Sample 


"The data of this investigation were collected originally 
in connection with a study of streaming, or homoge- 
neous ability grouping, in the junior (elementary) 
schools of England and Wales (see Barker Lunn, 1970). 
In the English system, children generally begin infant 
school in the term following their fifth birthday and 
junior school in the September following their seventh. 
Typically, these two types of school do not share the 
same building or teaching staff. 

A cohort of first-year junior school children older than 
7 years in 1964 constituted the sample, which was tested 
at yearly intervals for 4 years on a host of social, at- 
titudinal, and academic variables. Originally, a matched 
group of 50 streamed and 50 nonstreamed schools made 
up the sample (and the sampling unit), but over the 
course of the project, a number of organizational 
changes (schools changed from streamed to non- 
streamed and vice versa) reduced the final total to 72 
schools, 36 of each type. Comparative analyses disclosed 
that the reduction in sample had no systematic influ- 
ence on the results of the investigation. Over all 4 years 
of the study, generally complete information on 5,200 
children was available. 


Measures 


Expectancies. Consistent with the approach of West. 
and Anderson (1976, p. 616), expectancies in this study 
were defined as “a teacher's attitude (behavioral pre- 
disposition) about a specific student.” We think it is 
reasonable to assume that different behaviors are as- 
Sociated with these variations in attitude. The opera- 
tionalization of teachers’ attitudes is provided in the 
data of Barker Lunn (1970). Near the end of each school 
year in her study, Barker Lunn’s teachers rated the 
children individually in their classes on a number of 
measures. In the first year, three such measures were 
collected. The first of these, employing a 4-point scale, 
was concerned with each child’s attitude to school work 
and asked the teacher to note whether the child was a 
very hard worker, a hard worker, average, or lazy. A 
Second expectancy measure was concerned with the 
frequency of a child’s disobedience, asking was the 
student disobedient never or seldom, quite often, or 
most of the time. A final measure, employing the same 
choices as the disobedience tating, was focused on the 
NER whether the child was a pleasure to have in 


In addition to these Measures, iti 
i , three additional 
T ratings were collected in Grades 2 through 4. 
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Two of these were 3-point scales concerned with th 
teachers’ estimates of each child's arithmetic or read) 
ability: Was the child among the top five to seven sie, 
dents in class with respect to arithmetic or readin 


The sixth and final measure demanded a global ra 
of general ability for each child: Was the child (a) ce i 
of grammar school placement, (b) above average, po 
sible grammar school material, (c) average, (d) belo 
average, possibly backward, or (e) dull and definite 
backward? 
Performance. In-class, group-administerd. 
achievement tests of a standard objective format we 
employed at the end of each of the 4 years of the originis 
study, approximately contiguously with the expecta 
measures. Parallel forms of most of these tests v 
constructed and presented in alternate years, Th 
particular form employed first was essentially rando 
from school to school. Sufficiently lenient time period 
were allowed in administration to consider these met 
sures as tests of power rather than speed. A comple 
description of the tests, their content and psychometri 
qualities, was presented by Barker Lunn (1970).? 
Reading consisted of 48 incomplete sentence item 
each listing five possible completions. Children were tj 
choose the alternative which best completed each set i 
tence. Items were arranged in order of difficulty andi 
were appropriate for children in the 7 to 11 and older t 
range. Coefficients of internal consistency (Kuder 
Richardson 20) over both test forms and the 4 admin 
istration years ranged from .93 to .95. 
The English test was composed of 64 items of graded 
difficulty, focused principally on assessment of com 
prehension but also included tests of. spelling and form 
grammar. Reliabilities ranged from .96 to .97. 
Problem arithmetic consisted of 30 problems d 
graded difficulty. The first 10 items were read to the| 
children to overcome possible reading deficiencies: 
Preliminary study had indicated that children unab 
to read were not likely to answer further questions 
Reliabilities ranged from .88 to .93. 


! Rozelle and Campbell (1969) have pointed out that 
four, rather than two, rival alternative hypotheses must 
be considered in the 


related negatively to children's later performance 0 
that prior performance might impinge negatively on 
later expectancies. These alternatives appear so im 
plausible, however, that they were not formally con 1 
sidered. An assessment of the tenability of these riva 
hypotheses is presented with the results in which 
preponderance of positive correlations serves to indicate) 
the implausibility of these alternate possibilities (set 
Crano et al., 1972, p. 275). a 
? Access to Barker Lunn's data was provided by th 
National Foundation for Educational Research in En: 
gland and Wales. We are grateful to this organization 
and to the British Social Science Research Council’ 
Survey Archive at the University of Essex, Colchester. 
England, where the data were stored, for their generous 
assistance on this project. 
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The mechanical arithmetic test, which contained 35 
items for the first-grade students and 44 items for 
Grades 2 through 4, focused on addition, subtraction, 
multiplication, and division. Some items in the later test 
assessed knowledge of fractions, money, and common 
weights and measures, Reliabilities ranged from .89 to 


.95. 
Concept arithmetic represented a departure from 


traditional measures in that it concentrated more on the 
understanding of mathematics and less on computa- 
tional skills per se. The test was composed of 52 items, 
the first 12 of which were administered orally. Only a 
single form of this test was constructed. This test was 
not administered to children in the first grade. Relia- 
bilities ranged from .94 to .96. 

In the third and fourth years, an 80-item verbal/ 
nonverbal ability test with 40 items of each type pre- 
sented in alternation was administered. No parallel 
form of this test was used. Reliabilities of the total test 
ranged from .93 to .94. In Year 2, a verbal ability scale 
of 85 items (rk-noo = .97) was used but no comple- 
mentary test of nonverbal ability was administered. 


Results 


General Matrix Characteristics 


After all relevant expectancy variables 
were reflected so that high scores indicated 
positive teacher attitudes, all possible cor- 
relations were computed between expec- 
tancy and performance variables across 
contiguous administration years. As the 
cross-lagged panel correlational technique 
demands repeated (identical) measures be- 
tween panels, the comparison of Grade 1 
with Grade 2 scores consisted of only seven 
variables (three expectancy and four 
achievement measures). The remaining 
comparisons (involving Year 2 with Year 3 
and Year 3 with Year 4 scores) allowed for 
tests between all six expectancy and six 
performance measures.? $ 

À perusal of the complete matrix of cor- 
relations revealed not a single negative re- 
lationship; therefore, as di earlier, the 
Rozelle and Campbell (1969) qualification 
to the general cross-lagged panel correla- 
tional technique does not appear to apply in 
the present research. Negative relationships 
between teacher ratings and children’s per- 
formance were not encountered; thus, only 
two of the four possible rival alternative 
hypotheses relating expectancies and 
achievement need be considered here. 

The mean test-retest teacher-rating Cor- 
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relation varied from a rather weak relation- 
ship in the Grades 1-2 comparison (Fg, = 
.289) to a more substantial figure in the 
Grades 2-3 (rg,g, = .550) and Grades 3-4 
(FEE, = .562) comparisons. The mean test- 
retest correlations between achievement 
measures over all adjacent years of the re- 
search were substantial (FAA, = .814; FAzAs 
= .831; Fasa, = .864). Mean synchronous 
(within-grade) correlations relating teacher 
ratings and children’s performance were also 
reasonably consistent over all 4 years of the 
investigation (Fg,4; = .364; Fg;A; = .429; Fg;A; 
= .435; rg,A, = .438). As the cross-lagged 
panel correlational technique demands a 
high degree of comparability between the 
synchronous correlations involved in the 
adjacent years' comparisons (see Crano et 
al.’s, 1972, discussion of stationarity), these 
results are both necessary and encouraging. 

The mean synchronous Year 2 correlation 

that was observed in the Years 1-2 compar- 

isons (F/gjA; = .332) was more similar to the 

initial year’s mean synchronous correlation 

(Fg;A,) than that presented above, since 

these comparisons involved only 7 variables 

rather than the complete set of 12. 


Cross-Lagged Panel Analyses 


A summary of the zero-order cross-lagged 
panel expectancy-performance correlations 
over all adjacent years of this study is pre- 
sented in Table 1, along with the results of 
a statistical comparison between these cor- 
relations by a t test devised by Pearson and 
Filon (1898). The alpha level employed in 
this table has been adjusted by a procedure 
suggested by Dunn (1961), to reflect that a 
total of 84 nonindependent statistical com- 
parisons was undertaken. Thus, throughout 
this study, only those t differences that ex- 
ceeded the value of 3.48 (df > 4,300) were 
deemed statistically reliable at the p < .05 


level. 


3 Ideally, all three sets of comparisons would have 
included information on all measures. In investigations 
involving the secondary analysis of data, conditions are 
rarely ideal. Nevertheless, though gathered for purposes 
very different from those of the present investigation, 
the fit of these data to the research question could not. 
have been much more complete. 
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Table 1 


Summary of All Cross-Lagged Panel Correlational Comparisons Between Contiguous 


Administration Years 
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Grades 1-2 


Grades 3-4 


Grades 2-3 


Variable TEA, TAE t TE2A3 TAoEs t TEsA4 TAsE4 
i with: (A 

oes cho Cod .130 .414 —15.70 .420  .414 .51 .428  .406 
English .162 .431 —15.02 .465  .462 26 480 457 
Mechanical arithmetic .137  .400 —14.38 .491  .440 3.98 512 Pus 
Problem arithmetic .098  .391 —15.87 .449  .456 | —.53 .489 dn 
Concept arithmetic .463  .449 1.13 ig Pun 
Verbal IQ 429 457 -2.66 .452 42: 

NO iubar ios -284  .191 5.34 .204  .178 1.60. .198 149 
English .295  .192 5.97 .251  .206 2.88 .212  .202 
Mechanical arithmetic -284  .195 5.07 .244  .186 3.65 194 .189 
Problem arithmetic -264  .135 724 .214 151 3.89 .163  .162 
Concept arithmetic .228 — 177 2.81 1 87 183 
Verbal Ii .220 .186 2.06 .209  .169 

n with: (A; 

Eos is .690  .285 27.36 .301  .259 2,01 .283  .225 
English -732  .287 30.81 .339  .275 4.04 .296 -261 
Mechanical arithmetic .603 .286 20.52 .339  .260 4.88 .285  .249 
Problem arithmetic .658  .240 27.34 .315  .244 434 .271 224 
Concept arithmetic 823  .253 4.32 270 -251 
Verbal IQ 297 274 145 .282 .231 
eading rating (E) with: (A 

* Reading 2 9 .509 .511 .525 .517 
English .485 .457 524 -489 
Mechanical arithmetic .320 .303 3503  .331 
Problem arithmetic .949  .353 -880 367 
Concept arithmetic .369  .372  —.19 .404 .328 
Verbal IQ -40  .478 -5.40 .455  .16 

Arithmetic rating (E) with: (A) y 
Reading 383 .389 —.38 .406 406 
English 399 .372 2.4 4422 413 
Mechanical arithmetic .437  .410 — 2.11 .461  .457 
Problem arithmetic 458 445 101 .486  .471 
Concept arithmetic 447 439 60 .481  .467 
Verbal IQ 391 .415 -1.77 .494 422 

Ability rating (E) with: (A) 

eading .692  .700 -—1.10 .685  .715 
English -132  .730 7 .725  .146 
Mechanical arithmetic .652  .637 1.74 .687  .690 
Problem arithmetic 686 .689 —42 695 .715 
Concept arithmetic -706 .680 3.25 .722 .742 
Verbal IQ .667 719—648 .703  .706 


Note. E = teachers’ expectation; A = students’ achievement. As discussed in the text, for ¢ > 3.48, p <.05 for all entries. 


In Table 1, a positive t value indicates that 
an early expectancy measure was more pos- 
itively related to a later performance indi- 
cator than the converse, and thus the pre- 
ponderance of causation was in the direction 
of an early expectancy causing a later 
achievement. Conversely, a negative t value 
indicated that early performance impinged 
upon a later evaluation to an extent ex- 
ceeding that to which the early evaluation 
affected performance. For example, consider 
the first three entries of the first row of the 
table. The correlations indicate that chil- 


dren’s reading achievement in Grade 1 was 


strongly related to teachers’ evaluations 
the same subjects’ attitude to school in tl 
second grade (ra,g, = .414), while teacher 
estimates of children’s attitude towal 
school in the first grade were not substal 
tially related to reading performance in th 
second (rg;A; = .130). Statistical compar! 
disclosed a significant difference betwi 
these correlations, £(4,300) = —15.70, p 
-001, suggesting that the preponde an 
causal sequence was in the direction of chil 
dren's early (first-grade) reading skills op 
erating as a cause of later (second-gra 
teacher estimates of the children's attitud 
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to school.* This finding appeared to have 
some generality, as second-grade teachers' 
assessments of children's attitude to school 
were consistently related to the children's 
first-grade achievements in English, me- 
chanical arithmetic, and problem arithmetic 
to an extent significantly exceeding that to 
which the early evaluation of school atti- 
tudes influenced performance in the second 
grade, all £s(4,300) > —14.3, p « .001. This 
finding was not replicated in the Grades 2-3 
and 3-4 comparisons, however. 

One consistency that did emerge in these 
later data panels involved the effects of 
teachers’ estimates of their students’ general 
ability. In the available comparisons of Table 
1 (recall that this measure was not collected 
in Grade 1), it appears that the evaluation of 
general ability was influenced significantly 
by children’s previous year’s academic per- 
formance, to an extent appreciably exceed- 
ing that to which ability evaluations in- 
fluenced later performance. 

The major consistency of Table 1, how- 
ever, involved the almost ubiquitous pattern 
of positive t values. To be sure, correlations 
involving school attitudes in the first com- 
parison, and general ability in the later ones, 
reversed this general trend, but it should be 
recognized that these were mentioned be- 
cause they represented departures from the 
norm. Over all years of this investigation, 
positive t differences occurred in 62 of 84 
possible comparisons, indicating the causal 
primacy of teachers’ evaluations and ex- 
pectations on children’s later performance. 
This is not to say that children’s early per- 
formance did not influence later evaluations, 
but rather that the effect of early expecta- 
tions and evaluations on performance far 
outweighed the impact of early performance 
on later expectations. d 

A summary consideration of the statisti- 
cally reliable differences (by Dunn's, 1961, 
criterion) observed in Table 1 supports this 
general observation. In the Years 1-2 com- 
parisons, for example, 8 of 12 statistically 
significant t differences indicated the causa 
primacy of expectations on children’s 
achievements. In the Years 2-3 comparisons, 
7 of 9 significant differences were consistent 
with these findings; of the remaining t if- 
ferences, 18 of 27 were positive. In the final 
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Grades 3-4 comparisons, 4 of 5 significant 
differences supported the expectancy — 
achievement hypothesis; of the remaining 31 
t differences, 25 were positive. Over all 
comparisons, the general data pattern of 
Table 1 provides strong support for the 
Rosenthal and Jacobson (1968) expectancy 
hypothesis. 


The Teacher as Veridical Observer 


A potential problem in the interpretation 
of significance differences in cross-lagged 
panel correlations arises in this study, as 
both the achievement and expectancy mea- 
sures were gathered at the end of each school 
year. Conceivably under these circum- 
stances, cross-lagged correlations could re- 
flect merely the relationship between con- 
tiguous expectancy and performance mea- 
sures that had developed over the course of 
the first (or temporally prior) measurement 
panel. For example, consider the relationship 
involving children's problem arithmetic 
scores and their teachers' ratings on pleasure 
to have in class, as presented in the Grades 
1-2 comparison of Table 1. On the basis of 
the zero-order correlations, it seems appar- 
ent that the teachers' perception of the 
pleasantness of the children in the first grade 
impinged strongly on children’s problem 
arithmetic scores in Grade 2 (rg; = -658), 
while the children’s problem arithmetic 
scores in Grade 1 had little influence on later 
(Grade 2) ratings of pleasure (roig, = +240). 
The difference between these two correla- 
tions was substantial, t(4,300) = 27.34, p € 
.001, suggesting the causal operation of the 
pleasure variable on later arithmetic scores. 
But one would be justified questioning 
whether the cross-lagged correlation - be- 
tween pleasure in Grade 1 and arithmetic in 
Grade 2 was reasonably indicative of a true 
causal relationship or merely a reflection of 
the children’s initial arithmetic skills being 
veridically perceived by their teachers, who 


4 ees of freedom for all comparisons varied as a 
function of missing data (pairwise deletion of missing 
data was employed throughout this study). All com- 
parisons between correlations exceeded 4,300 d, a fig- 
ure which was used conservatively in all statistical tests 


throughout the study. 
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were influenced by these skills in coming to 
a judgment concerning the pleasantness of 
the child. A conservative solution to the 
teacher-as-veridical-observer alternative 
involves the use of partial correlations, which 
in this instance would entail an inspection of 
the influence of the prior pleasure variable 
upon the later arithmetic measure with the 
children’s initial arithmetic performance 
held constant. The resulting partial cross- 
lagged panel correlation (rp,a..a, = .392) 
provides an indication of the cross-lagged 
pleasure-arithmetic relationship which is 
relatively independent of the children's ini- 
tial (first-grade) status in arithmetic. This 
correlation significantly exceeds the com- 
plementary partial cross-lagged correlation 
(rA,EzE; = .073), which reflects the influence 
of arithmetic upon later pleasure ratings, 
with earlier pleasure ratings partialed out, 
t(4,300) = 15.81, p < .001. 

This result is especially striking since the 
typical achievement test used in this study 
was 45 times longer than the one-item ex- 
pectancy measures employed throughout 
(see Crano, 1977). These test length differ- 
ences, as expected, had a visible impact on 
the test-retest stability coefficients that were 
observed. Since the reliabilities of the per- 
formance measures always exceeded those 
of the expectancy variables, more was par- 
tialed out of the zero-order expectancy- 
performance coefficients than those in- 
volving the influence of early performance 
on later expectancies. Thus, a simple partial 
correlational approach would bias the in- 
vestigation against the expectancy-causes- 
performance possibility. To offset this pos- 
Sible bias, a modified partialing approach 
was devised, such that the test-retest sta- 
bility of each expectancy measure was as- 
sumed to be equal to that of the performance 
measure with which it was compared.5 The 
outcome of this procedure, which was em- 
ployed over all possible expectancy-perfor- 
mance comparisons, was striking. Although 
this Process necessarily attenuated the 
magnitude of all cross-lagged panel corre- 
lations, the resulting differences between the 
correlations were enhanced relative to those 
observed in Table 1. This data treatment — 
Ms m Statistically controlled for the 

T-as-veridical-observer hypothesis— 
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resulted in a pattern of cross-lagged pani 
correlational differences that suggests eve 
more strongly than the first analysis that th 
preponderant causal relationship in the & 
pectancy-performance equation is that 0 
teachers’ expectations causing children’ 
later academic performance.® 
Discussion 1 
That a relationship exists between 
teachers’ evaluations and children’s aca 
demic performance has long been estab 
lished and was not at issue in this research, 
The causal interpretation of this frequently 
observed dependency had yet to be resolved, 
however, and represented the central foc 
of this investigation. Two competing caus 
explanations of the expectancy—performanet 
relationship have developed and both appeal 
plausible: The more classical position say 
the teacher as a valid indicator of the vario 1 
skills and abilities of the children under obi 
servation and thus the relationship betwee 
expectations and performance was a positi e 
feature, reflecting the quality of the teacher's 
training and insights. Teachers’ expecta: 
tions, in this scheme, were implicitly deri 
ative of, or caused by, the abilities or ac 
complishments of their students. The 
Rosenthal and Jacobson (1968) alternativ 
reversed the classical assumption and viewed 
the teacher as a causal agent whose varying 
expectations effected the differential de 
velopment of children's skills and abilitie S. 


5 Lengthening the expectancy measures by 45 times 
their original (one item) length by Spearman-Brownl 
procedures would have resulted in test-retest relia- 
bilities somewhat stronger than those employed in this) 
procedure, 

® The use of the partial correlational approach in the) 
cross-lagged panel correlational paradigm raises 
number of intriguing methodological issues in the 
context of the present research. On one hand, this ap- 
proach is overly conservative, in that it assumes that 
none of the variation which exists between synchronous. 
correlations is reflective of a causal relationship. Con- 
versely, it is well established that in situations involving: 
other than perfectly reliable measures, the partial cor- 
relation always represents an undercorrection. These 
two opposing factors operate in the present study, but 
as the extremely high reliabilities of measures reduce 
the degree of undercorrection, the resultant thrust of 
these opposed influences probably is to conservatize the 
results (i.e., to overattenuate, somewhat, the zero-order 
correlations and hence to reduce t differences between 
partialed cross-lagged panel coefficients; see Brewer, 
Campbell, & Crano, 1970). : 
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To the extent that these expectations were 
formed on the basis of irrelevant social, 
economic, or racial considerations (e.g., Rist, 
1970), they were viewed as unjustifiable. 
While the results of this investigation pro- 
vide some support for the causal implications 
of both of these competing explanations, the 
preponderant causal sequence observed in 
the cross-lagged panel correlational analyses 
of this research suggests that teachers’ ex- 
pectations caused children’s performance to 
an extent appreciably exceeding that to 
which performance influenced expecta- 
tions. 

While this summary observation ob- 
viously favors the Rosenthal and Jacobson 
(1968) position, the strength of the expec- 
tancy effect on children’s later performance 
appears to vary as a function of the nature of 
the particular expectancy measure that was 
employed. Conceptually, two different types 
of teacher expectancies were assessed in this 
research, One set was concerned primarily 
with the child’s social skills or conduct: Did 
the child have a positive attitude toward 
school work, was he a pleasure to have in 
class, was he obedient? The second set was 
concerned with the teacher’s evaluation of 
the child’s academic status, as indicated by 
the ratings of the child’s reading and arith- 
metic potential and general ability. Table 1 
indicates that the most positive and consis- 
tent results in favor of the expectancy 
proposition were obtained from those 
teacher evaluations concerned with the 
child’s social, rather than academic, perfor- 
mance. Teachers’ expectations and evalua- 
tions of children’s social development, in 
other words, appeared to exert a greater 
effect on later academic performance than 
those expectations concerned specifically 
with academic potential. Such a finding 
further reduces the plausibility of the al- 
ternative teacher-as-veridical-observer hy- 
pothesis discussed earlier, since if teachers 
expectations were merely reflective of the 
child’s obvious academic skills, we would 
logically expect the academic expectancy 
measures to have proven more predictive of 
later performance than the social indica- 
tors. 

These findings suggest à 
induced limitation of previous exper 


possible self- 
imental 


research conducted in this area and indicate 
the desirability of a departure from the ma- 
nipulations typically employed. In the great 
majority of experimental investigations of 
the teacher expectancy effect, attempts were 
made to influence teachers’ expectations 
concerning children’s probable future aca- 
demic performance or intellectual growth. 
We argued earlier in this article that a pos- 
sible explanation of many previous experi- 
mental failures was the weakness of these 
induced expectancies, relative to those which 
occurred as a natural concomitant of the 
everyday classroom interaction. The present 
results reinforce this observation and suggest 
that almost all previous experimental re- 
search in this area, including that of Rosen- 
thal and Jacobson (1968), was performed 
under conditions even more biased against 
a positive result than commonly acknowl- 
edged. 

For future studies of the extent and im- 
pact of the teacher bias effect, the results of 
the present investigation suggest a shift of 
focus from the academic to the more socially 
oriented evaluations and expectations, The 
data indicate that at least in the early ele- 
mentary grades, a teacher's affective re- 
sponse toward a child can have a very strong 
impact on that child’s academic perfor- 
mance. If these performance differences are 
then maintained or amplified over the course 
of the child’s academic career, and there is 
considerable evidence to support this pos- 
sibility (e.g., Coleman et al., 1966; Crano et 
al., 1972; Harlem Youth Opportunities Un- 
limited, Inc., 1964), then the importance of 
these social expectancy effects becomes ob- 


vious.’ 


7 This is not to suggest that a positive relationship 
between academic expectancies and performance did 
not exist. In fact, such cross-lagged correlations were 
generally larger than those involving social expectancies 
with later performance. The complementary relation: 
ship between performance and later academic expec 
tancies was also substantial, however, and thus the 
strength of the inference concerning the preponderance 
of causation between academic expectancies and per- 
formance, based upon differences between the cross- 
lagged panel correlations, was attenuated. The pattern 
of relationships obtained between these variables is 
suggestive of a feedback system, in which changes in one 
variable effect changes in the other, which in turn in- 
fluences the first (see Crano et al., 1972). 
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The specific mechanism(s) through which 
socially oriented expectations and evalua- 
tions operate on later academic performance 
is not specified in the present research, but 
a number of possibilities can be suggested. 
The partial correlational procedure that was 
employed reasonably precludes the propo- 
sition that the later performance differences 
which favored the positively evaluated 
children were a function of differences in 
previous performance. How, then, did the 
expectancy effect operate? It is possible that 
the mechanism of transmission of the effect 
operated between teachers. In that the ma- 
jority of schools constituting the sample were 
small, it is reasonable to assume that reports 
of a child's social and academic performance 
potential were passed sequentially from one 
teacher to the next throughout the child's 
elementary school career. The test-retest 
correlations between teacher expectation 
measures presented earlier provide support 
for this possibility. Thus the performance of 
a child whose reputation was supposedly 
established in the first grade would be 
viewed as clearly predictable by his second- 
grade teacher, by virtue of prior communi- 
cation with the child's first-grade instruc- 
tor. 

Another possibility, which could operate 
in conjunction with the first, is that a posi- 
tive teacher orientation would foster the 
establishment of a more positive and secure 
academic self-concept on the part of the fa- 
vorably evaluated child. Such a child would 
be more willing, indeed more able, to meet 
the next academic challenge. If these self- 
concept differences were reinforced each 
year, through the transmission of informa- 
tion between teachers, a profound shaping 
influence on the child would be expected. 
How could such an influence operate? It is 
conceivable that simple temporal differences 
in teacher-student interactions might ac- 
count for later performance differences. 
Presumably, ateacher who finds a child ob- 
edient and a pleasure to have in class would 
be willing to expend more time and energy 
on such a child than on one considered dis- 
SUME and unpleasant (see Rist, 1970). 
an teacher also would likely be more at- 

ve to such students, thus more readily 
recognizing in them any confusions or mis- 
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understandings. Such reactions would in 
pinge not only on the child's academic pe 
formance but on his self-concept as we 
These observations argue for a more extel 
sive and rigorous experimental and obse 
vational study of the teacher-student ij 
teraction process, a call that echoes that 
Brophy and Good (1970, 1974), West an 
Anderson (1976), Meichenbaum et al. (1969 
and many others. This research furthe 
specifies a realm of treatment and outcom 
measures of potentially great theoretics 
importance, which have not received h 
research attention that they apparently d 
serve. It is obvious on the basis of the presen 
research that the effects of social evaluatior 
and expectations, based as they are on 4 
pects of a child totally incidental to inte 
lectual promise, represent a potential edu 
cational inequity of even greater momen 
than that emphasized by Rosenthal am 
Jacobson (1968). We hope that this resea 
alerts investigators to the potential influent 
of social-behavioral expectations on chil 
dren's intellectual and academic develo| 
ment and that it proves instrumental in th 
instigation of more focused and critical m 
search in the future. 
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Social Comparison in the Classroom: | 
The Relationship Between Academic Achievement > 
and Self-concept 


Carl M. Rogers, Monte D. Smith, and J. Michael Coleman 
George Peabody College for Teachers 


One hypothesis derived from social comparison theory is that the relationship 
between academic achievement and self-concept can best be understood in 


terms of the child’s achievement standing compared with that of classmates. 
This hypothesis was tested on a sample of 159 academic underachievers in 


"e 


self-contained classrooms. When relative within-classroom achievement 
standing was not considered, reading achievement was not significantly relat- 
ed to self-concept, although mathematics achievement was. When relative 
within-classroom achievement standing was considered, both reading and 
math achievement were found to be significantly related to self-concept. 


Numerous studies have examined the re- 
lationship between children’s academic 
achievement and self-concept (Purkey, 
1970). Although many studies have reported 
a significant relationship between these two 
variables (e.g., Black, 1974; Bledsoe, 1967; 
Coopersmith, 1959; Fink, 1962; Lamy, 1965; 
R. Williams & Cole, 1968; Kohr, Note 1), 
others have failed to find any substantial 
relationship between academic achievement 
and self-concept (e.g., Lewis, 1972; 
Wattenberg & Clifford, 1964; J. Williams, 
1973). Even studies that reported significant 
academic achievement/self-concept rela- 
tionships typically reported low correlations 
between the two variables which, although 
statistically significant, had low predictive 
utility in terms of accounting for much of the 
observed variability of scores. This difficulty 
'was summarized by Kohr (Note 1), who 
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stated, “Although the relationship between | 
self-concept and academic achievement was ' 
statistically significant, it would appear to 
be neither substantial in degree nor simple 
in direction” (p. 7). 

A pervasive problem in self-concept/aca- 
demic achievement investigations has been 
a relative lack of concern with theoretical 
models, often resulting in technically ade- 
quate but conceptually weak investigations 
potentially masking rather than clarifying 
any existing relationship. Researchers ex- 
amining the relationship between academic 
achievement and self-concept typically have 
seemed to assume that this relationship is 
invariant and is manifest independently of 
other environmental or psychological factors. 
Hence, research in this area has paid little 
attention to other factors, such as the aca- 
demic or social environment from which 
samples or parts of samples were drawn and 
how such factors might have influenced the 
hypothesized relationship. 

Several theoretical statements on the de- 
velopment and maintenance of the self- 
concept, however, emphasize the importance 
of the social environment. For example, 
Gecas, Calonico, and Thomas (Note 2) dis- 
cussed two prominent theoretical orienta- - 
tions, model theory and mirror theory, both 
of which emphasize the interplay between 
the environment ànd the individual. Model 
theory suggests that a child develops a sense 
of self-regard through the process of imi- 
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tating various others in the immediate en- 
vironment; mirror theory proposes that the 
self-concept is a product of the reflected 
appraisals of others significant to the child. 
Closely related to mirror theory is Festinger’s 
(1954) theory of social comparison. Festinger 
suggested that in the absence of objective 
standards of comparison, people use signif- 
icant others in their environment as the 
bases for forming estimates of self-worth. 
The theory of social comparison processes, 
as it articulates with self-theory, has been 
explicated most clearly by Hyman and 
Singer (1967), who concluded that the self- 
concept is constructed on an edifice of social 
comparisons. Despite their differences, each 
of these theories would maintain that the 
process by which the individual develops and 
maintains self-regard is critically dependent 
on the social group in which the individual 
resides. 

These orientations suggest that the self- 
concept/academic achievement relationship 
can best be understood within the context of 
the person’s immediate social environment. 
Specifically, in terms of social comparison 
theory, we would expect that the importance 
of academic achievement for self-concept lies 
not in the absolute level of achievement but 
in the child’s perceptions of how his/her level 
of achievement compares with the achieve- 
ment of those in his/her social comparison 
group, in this case other classmates. Two 
children having identical achievement test 
results but residing in different classrooms 
would be expected to have differing self- 
concepts to the extent that their relative 
academic standing in each class differed. 

The implications of this theoretical der- 
ivation for research on theself-concept/aca- 
demic achievement relationship are far from 
trivial. Most studies in this area have sought 
large samples in order to augment general- 
izability and statistical power. This hasled 
many researchers to gather data from dif- 
ferent classrooms, schools, or even com- 
munities and then pool the data together for 
purposes of analysis. This could potentially 
lead, in its extreme form, to a total masking 
of the relationship between academic 
achievement and self-concept. For example, 
consider two third-grade classrooms in each 


of which there is a perfect positive relation- 


ship between academic achievement and 
self-concept but in which the general level of 
academic achievement differs substantially, 
with the highest achiever in the first class- 
room achieving at the same level as the low- 
est achiever in the second classroom. If the 
self-concept and achievement data for these 
two classrooms were pooled and then a cor- 
relation between the two variables was 
computed, the analysis might reveal a lim- 
ited or even inverse relationship between the 
two variables even though within each 
classroom a perfect correlation existed. 

In this view, the maintenance of self-con- 
cept is a phenomenological process related 
to the attributes of the social comparison 
group within which the individual resides. 
This study tested the hypothesis that the 
relationship between academic achievement 
and self-concept is manifest most clearly 
within the context of specific social com- 
parison groups, or classrooms. Specifically, 
two predictions were made. First, it was 
predicted that academic achievement and 
self-concept would be positively related, 
even among academic underachievers in 
special education classrooms. Second, it was 
predicted that the self-concept/academic 
achievement relationship would be manifest 
most strongly when academic standing 
within immediate peer-reference groups (i.e., 
classrooms) was incorporated into the anal- 


yses. 


Method 


Participants 


Participants in this study were 159 academic under- 
achievers in 17 classrooms in seven elementary schools 
of a major metropolitan school system. Participants 
ranged in age from 6 years 1 month to 12 years 1 month 
(M = 9 years 6 months); 22% were black and 2596 were 
female. Classroom enrollment ranged from 6 to 14 


children. 
Children 
basis of sever! 
Achievement Test sci 
were functioning appro: 


had been placed in the classrooms on the 
e academic deficits. Metropolitan 
ores indicated that the children 
ximately 2 years below age- 
appropriate grade levels. The children had been en- 
rolled in the special education classrooms an average of 
14 academic months. Prerequisites for referral were 
severe academic deficiencies, normal or low-normal 
intellectual capability, parental consent, and freedom 
from visual, hearing, or neurological handicaps. The 
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objective of placement in the special classrooms was 
academic remediation. The 17 special education 
teachers employed a variety of teaching techniques, 
focusing on the attainment of sufficient remediation to 
warrant returning the children to regular classrooms. 

Family socioeconomic (SES) information was avail- 
able from school records on 134 children. A composite 
family SES score was calculated for each child (Smith, 
Zingale, & Coleman, in press), based on parental edu- 
cation and occupation, with the utilization of a scale 
adapted from Warner, Meeker, and Eels (1949). The 
scale values ranged from 1 (highest SES) to 5 (lowest 
SES). Mean SES scores for the seven schools ranged 
from 3.54 to 4.24, which indicated lower-middle-class 
and working-class student composition. An analysis of 
variance (ANOVA) indicated no significant SES dif- 
ferences across schools, F(6, 127) = 1.98, ns. 


Procedure 


Participants were tested within their classrooms. Two 
testing instruments were group administered, the 
Metropolitan Achievement Test (MAT) and the 
Piers-Harris Children’s Self-Concept Scale. Given the 
age and ability differences across the entire sample, two 
different versions of the MAT were used: The MAT 
Primary I and the MAT Primary II. For children taking 
the MAT Primary I, Total Reading and Total Mathe- 
matics achievement grade equivalents were obtained 
for each child. For children taking the MAT Primary 
II, the Word Knowledge, Reading, Math Computation 
and Math Concepts subtests were administered, 
yielding a Total Reading grade equivalent and two 
different mathematics grade equivalents (Computation 
and Concepts). For the purposes of this study we treated 
the average of these two math achievement grade 
equivalents as roughly comparable with the Total 
Mathematics achievement grade equivalent yielded by 
the Primary I. 

The second instrument, the Piers-Harris Children's 
Self-Concept Scale (Piers, 1969), consists of 80 state- 
ments of a declarative nature (e.g., “My friends think 
that I have good ideas") to each of which the respondent 
marks yes or no, Approximately one half of the state- 
ments are positively worded, and the remainder are 
negatively worded to attenuate potential acquiescent 
response sets. Items were orally administered, a pro- 
cedure that has been suggested for administration of the 
Piers-Harris to children functioning at or below the 
fifth-grade level (Piers, 1969). The Piers-Harris yields 
a composite self-concept score that may range from 0 

to 80. In addition, the scale may be scored for six cluster 
scores, each purporting to measure one of these subdi- 
mensions of self-concept: (a) Behavior, (b) Intellectual 

and School Status, (c) Physical Appearance and At- 
tributes, (d) Anxiety, (e) Popularity, and (f) Happiness 
and Satisfaction. The Piers-Harris manual (Piers, 1969) 
Teported Kuder-Richardson Formula 21 homogeneity 
coefficients ranging from .78 to .93. Four-month test- 
Tetest coefficients of stability ranged from .71 to ye 

0 recent comparative reviews of the Piers-Harris 


oon by Robinson and Shaver (1973) and Wylie 
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Analyses 


Two series of analyses were computed. First, all 
subjects! were pooled together and rank ordered on 
basis of their achievement scores (once for ma 
achievement and once for reading achievement) andj 
the basis of these rank orderings were assigned to eith 
a high-, medium-, or low-achieving group for math 
matics and for reading. On the basis of these twot 
chotomizations of achievement data, 14 one-facti 
between-groups ANOVAS were computed (7 for re T 
ing achievement and 7 for math achievement), wil 
composite self-concept scores and individual clust 
scores as the dependent measures, 

Second, participants were rank ordered within e 
classroom according to their performance on the me 
sure of mathematics achievement and then accordii 
to their performance on the measure of read 
achievement. Within each class, for each of the two ral 
orderings, the participants were assigned to one of 
groups: high, medium, or low within-classroom aé 
demic achievement. To avoid any systematic place 
bias, we decided that when class size was not divisi 
by three, the number of participants assigned to 
high and low groups for a given class would be eq 
Hence, if the classroom contained 11 children, 3 
assigned to the high-achieving group, 3 to the lo 
achieving group, and 5 to the medium-achieving grot 
Finally, for both mathematics and reading achievemen 
high within-class achievers were pooled together acro 
classrooms, as were medium and low within-cla 
achievers, and 14 one-factor between-groups ANOVA 
were computed, 7 for mathematics achievement and 

for reading achievement, with composite self-conce} 
Scores and individual cluster scores as the depend 
measures, 


Results 


When assignment to high-, medium-, 6 
low-achievement groups was based 
within-classroom reading achievemen 
ANOVAS yielded significant group diffet 
ences in mean composite self-concept score 
F(2, 153) = 5.32, p < .007, and significan 
group differences on all of the six clusté 
scores. Results of these analyses are sum 
marized in Table 1. Newman-Keuls test 
indicated significant pair-wise comparison! 
between high- and low-achievement group! 
on the composite self-concept score and 0 
all six cluster scores, In addition, the medi 
um- and high-achievement groups differet 
significantly on the Intellectual and Schoo 
Status cluster score, and the low- and me: 
dium-reading-achievement groups differe 


the reading-achievement subtests. 
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Table 1 
Mean Self-concept Scores for Three Reading-Achievement Groups 
SSS event cce c 
Low Medi i 
Partition/criteria M SD n M SD n m n F « 
Within classroom $ 
Composite 510 13.0 46 55.3 122 64 
Behavior 118 35 127 33 ds 28 ^ POM 
zum and WT AW t 
School Status 11.7 . E 
vi Appearance 3.8 124 $5 Ml 34 547 005 
and Attributes 7.6 3.1 8.5 2.9 E 
Anxiety $4 n 718 26 83 26 698 Q5 
Ae ; A 7.7 2.5 8.6 2.6 312 05 
Satisfaction 6.2 19 
Irespece of classroom js ed Ps d ra XA 
Jomposite 53.3 112 52 6561 150 52 566 140 52 109 
Behavior 117101553 VOY 46 Q5 
Inlet uH 13.0 3.4 13.4 3.8 3.446  ,05 
School Status 124 3.3 13.0 y 
Physical Appearance : i i Y % 3 
and Attributes 8.6 18 8.5 3.1 8.1 3.2 «1 ns 
Anxiety* 6.9 2.7 8.0 29 7.8 2.6 241 
Popularity 7.6 23 7.6 3.0 8.4 2.4 1.97 
Happiness and 
Satisfaction 6.5 18 6.6 23 7.0 1.9 <l ns 


* High scores indicate low anxiety. 


significantly on the Anxiety cluster score. On 
all seven dependent measures, the high- 
reading-achievement group obtained the 
highest mean self-concept score, the low- 
reading-achievement group obtained the 
lowest mean self-concept score, and the 
mean for the medium-achievement group 
was intermediate in magnitude. 

When the reading achievement trichot- 
omization was conducted irrespective of 
within-classroom standing, on the other 
hand, an ANOVA indicated no significant 
differences among groups in terms of mean 
composite self-concept scores, F(2, 153) = 
1.09. Moreover, there were no significant 
differences on five of the six cluster scores. 
Group differences were manifest on the Be- 
havior cluster score, where both medium- 
and high-reading-achievement groups ex- 
hibited means significantly greater than the 
low-achievement group but did not differ 
from one another. 

When the partition into achievement 
groups was conducted without considering 
within-classroom standing, the mean com- 
posite self-concept discrepancy between the 
high- and low-achievement groups was only 
3.3 points. On the other hand, this discrep- 
ancy was 9.0 points when the partition was 


made on the basis of relative within-class- 
room reading achievement. The test for lin- 
ear trend on composite self-concept for the 
within-classroom partition was highly sig- 
nificant, F(1, 153) = 10.62, p <.01, whereas 
a similar test for the classroom-irrespective 
partition was nonsignificant, F(1, 153) = 
1.56, ns. 

When high, medium, and low groups were 
formed on the basis of math achievement, 
analyses of composite self-concept scores 
yielded significant group differences for both 
the within-classroom partition, F(2, 156) = 
12.02, p « .0001, and for the classroom-ir- 
respective partition, F(2, 156) = 4.17, p < 
.025. Results of the analyses based on math 
achievement are summarized in Table 2. 

When the partition was based on within- 
classroom performance, pair-wise compari- 
sons revealed that low math achievers ob- 
tained significantly lower composite self- 
concept scores than either medium or high 
math achievers but medium and high groups 
did not differ. The mean discrepancy be- 
tween high- and low-achievement groups 
was 12.8 points. Significant group differences 
were obtained on five of the six cluster 
scores. Both medium- and high-achievement 
group means differed significantly from the 
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Table 2 A 
Mean Self-concept Scores for Three Math-Achievement Groups 
Achievement 
Low Medium High 
Partition/criteria M SD n M SD n M SD n 
ithin d 
ene 484 114 46 566 126 67 612 141 46 
Behavior 10.7 3.4 13.5 3.2 13.8 3.4 
Intellectual and 
"School Status 11.3 3.6 13.0 3.5 13.9 3.5 
Physical Appearance 
and Attributes 7.9 2.8 8.4 29 9.3 3.0 
Anxiety* 6.2 25 BI 2.6 8.7 2.5 
Popularity 6.9 23 8.0 2.5 8.8 27 
Happiness and 
Satisfaction 6.0 1.9 6.9 1.8 7.4 2.0 
Irrespective of classroom 
Causal 51.5 10.1 53 562 149 53 589 145 53 
Behavior 11.5 31 12.9 3.6 13.9 3.5 
Intellectual and 
School Status 12.6 2.8 12.4 4.1 13.3 3.8 
Physical Appearance 
and Attributes 88 22 80 33 8.6 3.1 
Anxiety? 6.4 2.5 7.9 2.6 8.3 2.7 
Popularity 7.5 2.0 7.7 2.9 8.6 2.7 
Happiness and 
Satisfaction 6.0 1.8 74 19 7.2 2.0 


a High scores indicate low anxiety. 


low-achievement group means on the five 
scores that yielded significant F ratios. In 
addition, the medium and high groups dif- 
fered significantly on the Anxiety cluster 
Score. 

In the trichotomization irrespective of 
within-classroom achievement, only low and 
high groups differed significantly on com- 
posite self-concept, with a mean difference 
of 7.4 points. Three of the six cluster score 
analyses indicated significant group differ- 
ences for the classroom-irrespective parti- 
tion: Behavior, Anxiety, and Happiness and 
Satisfaction. In each case, both medium- and 
high-achievement groups obtained signifi- 
cantly greater mean scores than the low- 
achievement group, but they did not differ 
significantly from each other. 

Tests for linear trend among composite 
self-concept scores produced significant re- 
sults for both the within-classroom partition, 
Fi, 156) = 23.28, p < .01, and the class- 
room-irrespective partition, F(1, 156) = 8.15, 
p <.01, 

Since previous research has indicated a 
portie jotioubip between IQ and self- 
concept (e.g., Coopersmith, 1967; Pi 
1969), the differential results found PRH 
the two methods of examining the academic 


achievement/self-concept relationship 
have been the result of greater concomital 
variation of IQ and achievement when p 
ticipants were grouped on the basis 
within-classroom reading achievemel 
standing. To test this alternative explant 
tion, we conducted two further ANOVA! 
using IQ scores available for 118 of the pal 
ticipants from a previous administration ( 
the Wechsler Intelligence Scale for Chil 
dren—Revised as the dependent variabl 
and high-, medium-, and low-reading 
achievement groupings as the independet 
variable. The results of these analyses 
summarized in Table 3. The first analy: 
with groupings derived from within-cla 
room achievement standing, yielded sign 
icant group differences, F(2, 115) = 3.32,] 
< .05, with low reading achievers exhibitin| 
significantly lower IQs than high achiev er 
and medium reading achievers not differi 
in IQ significantly from either low or hi 
achievers. The second analysis, with 
achievement groupings derived witho 
concern for within-classroom achievemel 
standing, also yielded a significant gro 
difference, F(2, 115) = 7.70, p « .001. 
reading achievers had significantly lower Ie 
than medium or high achievers, but mediun 
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Table 3 


Mean WISC-R IQs for Three Reading-Achievement Groups 


Achievement 


Low Medium High 
Partition M SD n M SD n M SD n F 
Within classroom 84.53 10.38 32 8746 1068 46 90.78 9.76 40 3.32* 
Irrespective of 
classroom 81.67 9.00 30 89.89 1133 47 89.90 913 4l 7.1002 

Note. WISC-R = Wechsler Intelligence Scale for Children—Revised. 

*p«.05. 
**p « 01. 


and high achievers did not differ signifi- 
cantly in IQ. These results tend to rule out 
the IQ/self-concept hypothesis as an alter- 
native explantation, especially given that the 
IQ discrepancy between the high and low 
reading achievers was only 6.25 points when 
trichotomization occurred within classrooms 
but was 8.23 points when the trichotomiza- 
tion occurred irrespective of within-class- 
room achievement standing. 


Discussion 


These results strongly supported our basic 
hypothesis that the relationship between 
academic achievement and self-concept is 
manifest most strongly within the context of 
the social comparison group or classroom. 
When participants were assigned to either 
a high-, medium-, or low-achievement group 
within their particular classroom on the basis 
of either reading or math achievement test 
results, a strong positive relationship was 
found between academic achievement and 
self-concept. This relationship appears to be 
not only statistically significant but sub- 
stantial as well, with an average composite 
self-concept score difference between low 
and high achievers of 9 points for reading 
achievement and almost 13 points for math 
achievement. In contrast, when this tricho- 
tomization was conducted irrespective of 
within-classroom achievement standing, no 
relationship was found between reading 
achievement and self-concept, and although 
a significant relationship was found between 
math achievement and self-concept, the 


strength of this relationship Was substan- 


tially less than when trichotomization was 


conducted within the classroom. 


Analyses of subdimensions of self-concept, 


as measured by the Piers-Harris cluster 
scores, also strongly supported the hypoth- 
esized social comparison bases of self-con- 
cept maintenance. For example, when the 
within-classroom partitions were made on 
the basis of reading achievement, the clus- 
ter-score means uniformly were ordered high 
achievement > medium achievement > low 
achievement, and all six ANOVAS were 
significant. This pattern of mean scores is in 
sharp contrast to the classroom-irrespective 
partition, in which the high > medium > low 
pattern was manifest on only two cluster 
scores and only one of these reflected sig- 
nificant group differences. 

This investigation was predicated upon 
the theoretical premise that an individual's 
self-concept is based in part on an edifice of 
social comparisons. This viewpoint assumes 
that the group(s) available to the individual 
are appropriate for making self-concept- 
relevant social comparisons. A child could 
well be a member of a classroom, however, 
and never utilize his/her classmates for 
comparison purposes with regard to some 
dimensions of self-concept. For example, a 
child's self-concept of himself/herself as a 
gymnast might be based minimally, if at all, 
on school classmates but rather on the 
members of the evening gymnastics class at 
the Y. 

In the contex 
dimension of se 


t of the present study the 
If-concept that should be 
most sensitive to within-classroom social 
comparisons is self-concept of academic 
ability. While the child’s classmates might 
te as a social comparison 


be inapporpria | 
group for some dimensions of self-concept, 


this group should be salient and optimally 
appropriate for forming and maintaining the 
self-concept of academic ability. 
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The Piers-Harris scale used in this study 
yields an Intellectual and School Status 
cluster score that conceptually appears to 
tap the self-concept of academic ability 
construct. (Representative items: I am good 
in my schoolwork. I am a good reader. My 
classmates think I have good ideas.) This 
cluster score was extremely sensitive to 
within-classroom and classroom-irrespective 
partitions on both reading and math 
achievement indexes. Significant group 
differences on this variable were obtained for 
both reading and math achievement 
within-classroom partitions, but no group 
differences were obtained when the parti- 
tions on achievement were made irrespective 
of relative within-classroom performance. 
Thus, the social comparison basis of self- 
concept maintenance seems especially clear 
when the subdimension of self-concept 
conceptually most relevant to the classroom 
peer group is examined closely. 

While the results of this study seem rela- 
tively clear, a number of factors do poten- 
tially limit their generalizability to other 
settings. First, participants in this study 
were academic underachievers attending 
special classrooms. Although we have no 
reason to believe that these results originate 
in the unique characteristics of the sample, 
replication of our findings with a more nor- 
mative sample would be desirable. Second, 
our decision to consider the average of Math 
Computation and Math Concepts grade 
equivalents for children taking the MAT 
Primary II as roughly equivalent to the Total 
Mathematics grade equivalent yielded for 
children taking the MAT Primary I con- 
ceivably may have led to a confounding of 
results for math achievement analyses. This 
was not thought to be a serious problem, 
however, since the Total Mathematics score 
on the Primary I consists of a combination 
of computation and concepts components. 
Therefore, the Primary I Total Math and an 
average of the Primary II Math Computa- 
tion and Math Concepts should be at least 
roughly comparable. Third, we have made no 
effort to differentiate between self-concept 
and self-report, notwithstanding widespread 
peces that the reported self-concept 
pearl willingly divulges about 

and what an individual truly 
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thinks of himself/herself may not correspoi 
exactly. The measurement of self-concept 
fraught with problems of social desirabilit 
systematic response bias, response restri 
tion, contextual effects, and myriad oth 
pitfalls (Wylie, 1974). Numerous indire 
unobtrusive, and projective self-conce] 
assessment instruments have been dey 
oped over the years (Robinson & Shave 
1973; Wylie, 1974) in efforts to avoid t] 
pitfalls inherent in the administration | 
self-report inventories. We realize fl 
drawbacks and limitations to self-repo 
inventories, yet we believe the administr 
tion of a well-standardized self-report it 
strument is preferable to the projective 
less obtrusive alternatives. Our philosoph 
of self-concept measurement was succinct 
summarized by Nunnally (1975): “Long a 
the author came to the conclusion that ger 
erally the most valid, economical, sometimi 
the only, way to learn about a person's sel 
timents is simply to ask him" (pp. 106 
107). 
Despite the possible limitations of gene 
alizability, our findings clearly support th 
hypothesis derived from social compariso 
theory that the most meaningful way to 
derstand the relationship between academ 
achievement and self-concept is within th 
context of the social comparison group 0 
classroom. When information regardin| 
relative academic standing within th 
classroom was considered, a strong relé 
tionship between both reading and matl 
achievement with self-concept was found 
but when information regarding relativ 
academic standing within the classroom v 
not considered, reading achievement showe 
no relationship to self-concept, and the 0 
served relationship between math achieve 
ment and self-concept was substantially l 
robust. Our results suggest that one ba 
way in which academic achievement in 
ences self-concept is through the process 
social comparison: The child compares 
her own level of achievement to tl 
achievement levels of others in the cl 
room, and to the extent that the results 
such a comparison are favorable, his or li 
self-concept is enhanced, but if the co 
parison is unfavorable, his or her self-cor 
cept may be diminished. 
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An alternative model of cognitive abilities has been proposed by Das, Kirby, 
and Jarman. This model states that information is integrated in the brain in 
two ways, through simultaneous and successive processing. The present study 
compared this information-processing model of cognitive abilities with a tra- 
ditional primary mental abilities model. It was found that simultaneous pro- 
cessing was primarily related to spatial ability. It was also related, to a lesser 
extent, to both memory and inductive-reasoning abilities. Both simultaneous 
and successive processing were related to memory ability. No evidence was 
found to suggest that simultaneous and successive processing could be equat- 
ed with, respectively, reasoning and memory, or, more generally, Level II and 
Level I abilities. It is suggested that Level II ability may be a conglomeration 
of reasoning, spatial ability, and some aspects of simultaneous processing. 


Scientific psychology has evolved two 
methodologies for the study of human be- 
havior (Cronbach, 1957). Experimental 
psychology has attempted to construct laws 
for the effects of environmental manipula- 
tions on “persons in general,” whereas 
correlational, or individual differences, 
psychology has described the structure of 
individual variation in isolation from envi- 
ronmental effects. The major result of 
Cronbach’s (1957) plea for the unification of 
these two disciplines has been the study of 
Aptitude X Treatment interactions, wherein 
the effects of environmental manipulations 
are modulated by individual difference 
variables. This methodological hybrid has 
generated extensive research, and though 
significant interactions have been elusive 
(Goldberg, 1972), the new discipline is 
flourishing (Cronbach, 1975). 

More recently (e.g., Carroll, 1976; Estes, 
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1974), a second form of integration, logica 
derived from Cronbach's statement, 
begun in the cognitive domain. The goal 
this integration is the description of ind 
vidual difference variables (abilities, inte 
ligence) in terms of constructs now used 
experimental psychology. These construdl 
include such elements of traditional memo 
models (e.g., Atkinson & Shiffrin, 196 
Hunt, 1971) as short- and long-term memo 


hoped that the use of the cognitive, př 
cess-oriented constructs will aid in an Ui 
derstanding of the nature of intellecti 
ability. 

Estes (1974) and Carroll (1976) hat 
subjectively analyzed a number of trad 
tional psychometric tests and have derivi 
the component processes that underlie SU 
cess or failure in these tasks. Hunt and h 
colleagues (Hunt, Frost, & Lunneborg, 1% 
Hunt, Lunneborg, & Lewis, 1975; Hunt 
Lansman, 1975) have investigated the Co 
relates of verbal and quantitative abil 
(psychometric variables); they have SU 
gested that verbal ability is related to 
pidity of short-term memory processing 8l 
that quantitative ability is related to res 
tance to interference. 

Yet another information-processing 4 
proach to individual differences, and í 
that is consistent with those mention 
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above, is that of simultaneous and successive 
processing, as developed by Das and his 
colleagues (e.g., Das, Kirby, Jarman, 1975). 
This model has its roots in Luria’s (1966a, 
1966b) clinical observations of brain-dam- 
aged patients and has been investigated 
through the development of a battery of 
cognitive tasks. Briefly, simultaneous pro- 
cessing can be characterized as involving the 
synthesis of separate elements into groups 
that generally have spatial overtones, with 
all portions of the synthesis being surveyable 
or accessible without dependence on their 
position within the synthesis. This type of 
processing is required, for instance, in the 
formation of any holistic gestalt, or in the 
discovery of the relationships among two or 
more objects. Successive processing, on the 
other hand, involves the integration of sep- 
arate elements into groups whose essential 
nature is temporal. Portions of this synthesis 
are accessible only in the temporal order of 
the series— each element leads to only one 
other, and access to any element is depen- 
dent on the preceding elements. Successive 
processing is necessary for the formation or 
production of any ordered series of events. 
According to Luria’s data, lesions of the 
posterior (parietal-occipital) cortex disrupt 
simultaneous processing, whereas lesions of 
the anterior (frontal-temporal) cortex dis- 
rupt successive processing. Further infor- 
mation regarding Luria’s original work, the 
development of the test battery, and the re- 
sults of a number of factor-analytic studies 
can be found in two previous review articles 
(Das, 1973; Das et al., 1975). The possible 
implications for Aptitude X Treatment in- 
teractions are given in Das and Molloy 
(1975). 

The simultaneous-successive battery 
(which will be described more completely 
below) normally yields three factors. The 
first, simultaneous processing, is defined by 
tests such as Raven’s matrices, a figure 
copying task, and the Memory-for-Designs 
test. Successive processing is normally de- 
fined by serial recall of familiar words, a 
visual short-term memory task for digits on 
a spatial grid, and by digit span. The third 
factor has been identified as speed; it is de- 
fined by a word reading speed test and bya 
color naming speed test. 


The purpose of the present study was to 
relate these information-processing factors, 
derived from the simultaneous-successive 
battery, to a more traditional model of 
human abilities. The traditional model that 
was chosen for comparison purposes was the 
primary mental abilities model. As Horn 
(1976) and others have suggested, the 
present variety of abilities models (e.g., 
Cattell, 1971; Vernon, 1950) forms a con- 
sensus on the outline of a general abilities 
model. In essence, this model states that 
abilities are hierarchical, proceeding from 
relatively specific primary factors to more 
general higher order factors, and it is well 
represented by the primary mental abilities 
(French, Ekstrom, & Price, 1963; T'hurstone, 
1938). 

The selection of tests to represent the 
primary mental abilities was determined by 
the ways in which simultaneous and suc- 
cessive processing have been described and 
measured. Though composed largely of 
memory and reasoning tasks, the simulta- 
neous-successive battery has been said to 
produce factors that are neither memory nor 
reasoning (Das et al., 1975). For this reason, 
and because spatial ability would seem to be 
most related to the concept of simultaneous 
processing, it was decided to include tests of 
spatial, memory, and reasoning abilities in 
the primary mental abilities (PMA) bat- 


tery. 
Method 


Subjects 


The subjects were 104 fourth-grade boys, attending 
five urban schools. These schools represented a g 
cross-section of the city. The mean age was 110 months 
(SD = 4.9); mean verbal and nonverbal IQs (Lorge- 
Thorndike) were 102 (SD = 16.7) and 109 (SD = 16.9), 


respectively. 


Tests 


The following tests were administered to all subjects. 
All tests except digit span, word reading, and color 
naming, which were given individually, were given to 
classroom-sized groups. Except where otherwise indi- 
cated, tests were scored for number of items correct. 

Raven’s Coloured Progressive Matrices. A tradi- 
tional test of general, nonverbal reasoning (Raven, 1938, 
1965) or of fluid intelligence (Cattell, 1971), the matrices 
have been found by Das (e.g., 1972) to be a marker for 
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simultaneous processing. The subject is required to 
indicate which of six alternatives correctly completes 
a given nonverbal pattern. a 

Figure copying. Developed by the Gesell Institute 
(Ilg & Ames, 1964), this test requires the subject to 
simply copy geometric figures when they are in view. 
Each of 10 drawings is scored as 0, 1, or 2 according to 
the degree of correctness of reproduction. Scoring cri- 
teria (available in Leong, 1974) emphasize the mainte- 
nance of geometric relations and proportions rather 
than exact reproduction. This test normally loads on 
simultaneous processing. 

Memory-for-Designs Test. This test was originally 
devised by Graham and Kendall (1960) to detect min- 
imal brain damage. Subjects inspect simple geometric 
figures for 5 sec, and after the figure has been removed 
they are required to draw each from memory. Each of 
15 figures is presented separately by means of a Kodak 
Carousel Projector and is scored 0, 1, 2, or 3 according 
to the correctness of the reproduction. As in figure 
copying, drawings are scored for the maintenance of 
relations and proportions rather than for the presence 
and accuracy of details. The test normally loads on si- 
multaneous processing. 

Serial recall. Twenty-four lists of four words are 
played over a tape recorder. After each list, the subject 
is immediately required to write in the correct order as 
many of the words as can be recalled. Twelve of the lists 
are composed of unrelated words and 12 of words that 
are acoustically similar (e.g., tap, cap). This test has 
previously been administered individually with oral 
responses. It is a test of successive processing. 

Visual short-term memory. Subjects view a five- 
digit grid projected by a Kodak Carousel Projector for 
5 sec. Upon removal of the slide, the class is required to 
read an indicated item from a list of colors (filler task). 
They are then permitted to recall as many of the digits 
as possible in an empty grid, in the correct positions. 
There are 20 grids. This is a test of successive process- 
ing. x 

Digit span. Subjects are read lists of digits of in- 
creasing lengths. They are given two opportunities to 
successfully recall in correct order at each list length. 
Their score is the maximum list length recalled. This 
test should load on successive processing. 

Word reading. Subjects are required to read the 
names of four colors presented 10 times each in random 
order (40 words in all) on a slide projected by a Kodak 
Carousel Projector. The projected image was approxi- 
mately 55 cm X 35 em. Subjects were instructed to “read 
all the words, as quickly as possible, without making any 
mistakes" and were timed with a stopwatch. The test 
(Stroop, 1935) loads on the speed factor. 

Color naming. "This test (Stroop, 1935) is similar to 
word reading, with the exception that the colors on the. 
E AT d Du redde 

n to read all 40 strips, and this 
test usually loads on speed. 


qiue grouping. This isa test of reasoning adapted 
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Word grouping. Another reasoning test adap 
from the SRA kit, this test requires the subject to 
which of four words is unlike the others and indic: 
answer on an IBM answer sheet. 

First and last names. This test was selected fro 
French et al. (1963) Kit of Reference Tests for Co 
Factors. It is a test of associative memory that 
the subject to study a list of 15 first names paire 
last names for 3 minutes. Subjects then have 2m 
to write as many of the first names beside the 
priate last names that appear on a second sheet of p 
in scrambled order. 

Word-number. This is another test of associé 
memory adapted from the French et al. kit. Si 
study a list of 15 numbers paired with words for 3m 
utes and then have 2 minutes to write as many 0 
numbers beside the appropriate words, which ap 
on a second sheet of paper in scrambled order. 
Spatial relations. This test of spatial abil 
adapted from the SRA kit and requires the child 
lect which of four shapes, when added to a given shi 
will form a square. Answers are placed on an IBM 
swer sheet. 

Card rotations. This test of spatial abil 
adapted from the French et al. kit. Subjects are reqt 
to indicate whether given figures have been 
within the same plane or flipped. Answers are pl 
an IBM sheet. Because of the difficulty of this 
problems in 4 minutes), guessing is penalized b! 
tracting the number incorrect from the numbe 
rect. 


Procedure 


All testing was done in the schools. Classroom 
took place in either the regular classroom or, in 
of open-area schools, in an available enclosed 
with which the students were familiar. Indi 
testing took place in any available quiet room. I 
child received all of the tests within a 1-month per 
Total individual testing (approximately 4 minutes 
child) and total group testing time (approximatel 
hours per class) were constant for all classes. Bet 
of constraints imposed by school schedules, each’ 
received a different order of tests that had not | 
randomized in any systematic manner. f 


Results and Discussion 


Before the results are presented, i 
be emphasized that this study was de 
to relate the two batteries, not to eva 
whether one is in any sense “better.” In 
there is no simple way to produce SUC 
judgment. After relating the factors 
emerge from the two batteries in the pret 
study, an exploratory analysis, factor. 
lyzing together the tests of both ba’ 
will be presented. This should provide: 
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Correlations of the Eight Simultaneous-Successive Tests and the Six Primary Mental Abilities Tests 
Variable 


Table 1 
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Note. n = 104. Decimals have been omitted. 


evidence regarding the salience of the vari- 
ous factors. 


Relations Between Batteries 


The most appropriate way of relating two 
batteries of tests administered to the same 
individuals is to factor analyze the batteries 
separately, generate factor scores, and then 
correlate the factors of one battery with 
those of the other (Gorsuch, 1974). Correla- 
tions can also be calculated between the 
factors of one battery and the tests of the 
other. 

The correlations among all tests can be 
seen in Table 1. Those among the simulta- 
neous-successive tests (Tests 1-8) were 
submitted to a principal components anal- 
ysis, and the three factors whose eigenvalues 
were greater than 1.0 were rotated according 
to a varimax criterion. Previous research 
(Das et al., 1975) also dictated the extraction 
of three factors and an orthogonal rotation. 
This analysis can be seen in Table 2. 

With the exception of the loading of serial 
recall on the factor identified as simulta- 
neous processing, Table 2 conforms closely 
to the pattern established in previous re- 
search. Simultaneous processing is defined 
by Raven’s matrices, figure copying, and the 
Memory-for-Designs Test; successive pro- 
cessing by serial recall, visual short-term 
memory, and digit span; and speed by word 
reading and color naming. The anomalous 
loading of serial recall on simultaneous 
processing is possibly explained by the fact 
that recall of the list in the present study was 
in writing; in past research the test was given 
individually and responses were given orally. 
By requiring written responses, the task al- 
lowed subjects to think in terms of four 
“slots” or columns on their answer page, 
which were to be filled with appropriate 


* items. Casual examination of the response 


sheets indicated that a large number of 
subjects did organize their answers in this 
way, often leaving blanks for missing words. 
This assigning of word to slots would be more 
difficult with an oral response and probably 
contributes a spatial characteristic to the 
task. The major loading, however, remained 
on successive processing. 

The loading of serial recall on simulta- 
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Principal Components Analysis with Varimax Rotation of Simultaneous-Successive Battery 


Unrotated factor 


Varimax factor 


Variable 1 2 3 Succ Sim Speed 
Raven’s matrices 591 430 —240 104 753 —120 
Figure copying 601 459 —012 245 713 057 
Memory for designs 540 544 —269 011 810 —054 
Serial recall 751 030 180 642 402 —154 
Visual short-term memory 722 205 423 838 187 —076 
Digit span 547 311 470 785 —014 —045 
Word reading 536 626 286 —407 025 771 
Color naming 342 431 731 078 —122 904 
Variance 2.192 1410 1.179 1.972 1.945 1.464 i 
% of total 34.9 17.6 14.7 24.6 24.3 18.3 67.2 


Note. n = 104. Suce = successive; Sim = simultaneous. Decimals have been omitted for factor loadings. 


neous processing, and the slot strategy that 
it suggests, are reminiscent of Miller’s (1956) 
and Johnson’s (1970) work on chunking, 
coding, and the organization of free recall 
and of Estes’ (1974) discussion of how such 
complex processes could determine the 
performance in a test as seemingly simple as 
digit span. The basis for the linking of 
chunking or coding with simultaneous pro- 
cessing is given by Johnson’s (1970) defini- 
tion of a chunk as “any response set or se- 
quence which is represented in memory by 
a single code” (p. 173). Clearly, what has 
been called simultaneous processing is in- 
volved in the formation of such an entity. 
The role that this coding or simultaneous 
processing plays in a given task remains a 
function of the task demands. The present 
results suggest that the introduction of a 
written response in the serial recall task en- 
courages such a manner of processing. 

A similar procedure was followed for the 
six PMA tests (Tests 9-14). Although only 
two eigenvalues were greater than 1.0, pre- 


vious research (e.g., Thurstone, 1938) su 
gested the extraction of three factors anda 
oblique rotation. Three factors were e 
tracted and rotated to a promax criterioi 
The result can be seen in Table 3. The 
three factors are readily identifiable 
Spatial, Memory, and Reasoning. The latt 
two could also be called Level I and Level! 
ability (e.g., Jensen, 1970). 

"Tables 4 and 5 contain, respectively, th 
correlations between the simultaneou: 
successive factors and the PMA tests an 
those between the PMA factors and the 
multaneous-successive tests. The correle 
tions between the factors of the two batterie 
are in Table 6. The simplest way to examil 
these data is to first inspect the correlatio 
between factors (Table 6) and to then 
sider in detail the meaning of significal 
correlations by examining the correlati 
between factors and tests (Tables 4 and 5 
Because error is associated not only with tht 
correlation coefficient but also with 
factor scores, a conservative probability 


"Table 3 
Principal Components Analysis with Promax Rotation of Primary Mental Abilities Battery 
, Unrotated factor ' Promax factor (pattern) : 
Variable 1 2 3 Static lain DC Roa h? 
Figure grouping 678 —.239 156 
D à i t 3600  - E 
Word grouping .652 — —.188 “2 DISSE E 1000 EL 
Pirat last names 639 E 153 — —.028 15 E «104 
d x 1 i —.147 031 - -815 
Spatial relations 159 -259 398 949 08 Zis E 
T : =. —319 873 004 — —.023 TAL 
arance — 2.644 1.089 716 1.763 ; : 44 
- % of total variance 441 18.2 119 294 ^ sale 423 
ote. n = 104. Mem = x n " A r 
orig epagalis 560, e uy oe Correlations between factors are as follows: memory-spatial = 30 8 T 
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Table 4 
Correlations of Three Simultaneous- 
Successive Factors with Primary Mental 


Abilities Tests 


Factor j 

Test Simultaneous Successive Speed 
Figure grouping 375 065 —091 
Word grouping 262 248 —263 
First-last names 397 238 —150 
Word-number 222 345 —104 
Spatial relations 523 154 —155 
Card rotations 408 164 —219 


Note. r.99 (two-tailed) = 254. Decimals have been omitted. 


(.01) is recommended for the ensuing corre- 
lations. 

In Table 6, four significant relationships 
can be seen: between simultaneous pro- 
cessing and each of the three PMA factors 
and between successive processing and 
memory. By far the strongest of these rela- 
tionships is between simultaneous process- 
ing and spatial ability. The strength of this 
relationship and the relatively lower corre- 
lation between simultaneous and reasoning 
support the interpretation that simultaneous 
processing is not merely reasoning (Das et 
al., 1975). Table 4 shows that simultaneous 
processing is highly correlated with both 
tests of spatial ability, and Table 5 shows 
similarly that the spatial factor is related to 
each of the simultaneous tests. 

Simultaneous processing is related to 
reasoning but less so than it is to spatial, and 
no more so than it is to memory. In Table 4 
it can be seen that this moderate relationship 
to reasoning is largely due to the one rea- 
soning test (figure grouping) that is non- 
verbal. This suggests that the relationship 
does not involve reasoning as such but rather 
is a function of the figural stimuli used in one 
of the reasoning tests. In summary, these 
correlations argue against any identification 


Table 6 


Correlations of Three Simultaneous-Successive Batt 


Table 5 

Correlations of Three Primary Mental 
Abilities Factors with Simultaneous- 
Successive Tests 


Factor 
Test Spatial Memory Reasoning 
Raven’s matrices 525 187 386 
Figure copying 363 366 125 
Memory for designs 433 346 374 
Serial recall 338 — 413 456 
Visual short-term 241 384 233 
memory 
Digit span 173 268 193 
Word reading Z0 -W1 —197 
Color naming —243 -191 —267 


Note. r g9 (two-tailed) = 254. Decimals have been omitted, 


of simultaneous processing with inductive 
reasoning. Furthermore, it appears that both 
simultaneous and successive processing play 
important roles in speech and language 
comprehension: This is suggested by the 
similar correlations of the simultaneous and 
successive factors with the verbal reasoning 
test, word grouping (Table 4). 

Table 6 also shows that both simultaneous 
and successive processing have moderate, 
though significant, correlations with the 
memory factor. The relation of simultaneous 
processing to memory can be understood if 
one recalls the discussion above implicating 
this form of processing in coding or chunk- 
ing. The two paired-associate memory tasks 
require the linking or relating of pairs of 
items, and this linking can involve chunking. 
That this is so is shown in Table 4 by the 
significant correlation between simultaneous 
processing and the first and last names test. 
It would seem that in this task, subjects used 
some sort of hierarchical coding strategy. 

The relating of items required by the 
paired-associate tasks can also be accom- 
plished by a sequential, rote-associative form 
of coding. This is demonstrated by the cor- 


ery Factors (Varimax Rotation) with Three 
omax Rotation) 


Primary Mental Abilities ( 'PMA) Battery Factors (Pri 
PMA battery factors 


Simultaneous 


215 
—244 


Successive 
M 


Note. r gs (two-tailed) = 254. Decimals have been omitted. 
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Table 7 


Principal Components Analysis of Eight Simultaneous-Successive Tests and Six Primary 


Mental Abilities Tests (Varimax Rotation) 


Variable i 

Raven’s matrices 680 
Figure copying 274 
Memory for designs 535 
Serial recall 340 
Visual short-term memory 106 
Digit span 036 
Word reading —029 
Color naming -174 
Figure grouping 689 
Word grouping 600 
First-last names 291 
Word-number —066 
Spatial relations 740 
Card rotations 718 

Variance .969 


* of total variance 


Note. n = 104, Decimals have been omitted from factor loadings. 


relation between successive processing and 
the memory factor (Table 6) and by that 
between successive and the word-number 
test (Table 4). In summary, then, it can be 
seen that subjects tend to use two different 
forms of coding or processing when con- 
fronted with two seemingly similar memory 
tasks. Because of this, no identification of 
PMA memory with successive processing can 
be supported from these data. Successive 
processing, however, does appear to resemble 
the concept of sequential association (though 
perhaps at a higher, more general level) that 
the PMA memory tasks were intended to 
measure. In measuring this particular ability, 
it seems that care must be taken to minimize 
the impact of superordinate coding strate- 
gies so that the results of sequential, asso- 
ciative strategies can become apparent. 


Exploratory Analysis 


The correlations contained in Table 1 were 
also submitted, together, to a principal 
components analysis, and the four factors 
with eigenvalues greater than 1.0 were ro- 
tated to a varimax criterion (Table 7). Factor 
1 of this analysis is very complex: Tests of 
simul Processing, reasoning, and 
spatial ability load on it. The other three 
factors are clearer. Factor 2 is successive 
Processing (cf. Table 2). Factor 3 is defined 
by the two PMA memory tests and by two 
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Varimax factor 


103 237 055 
141 698 157 

—033 517 026 555. 
635 259 —124 602 
809 248 —041 728 - 
761 070 —049 596 

—393 —042 721 676. 
102 —108 881 828. 
021 081 —027 483 
335 -128 —299 578 
165 596 —211 512 
283 728 —196 653 
089 238 —065 617 
135 095 —195 580 

2.08 : 


tests of simultaneous processing: This wol 
seem to support the suggestion made ini 
preceding section that simultaneous pi 
cessing is involved in the coding required 
paired-associate tasks. Factor 4 is the spe 
factor that was seen in Table 2. 
Although Table 7 would seem to supp 
the distinctiveness of successive proc 
PMA memory, and speed, it also demo 
strates the important relationships amoi 
inductive reasoning, spatial ability, and sol 
aspects of simultaneous processing. Perh 
it is this latter conglomerate of abilities th 
is behind what Jensen has referred to 
Level II ability. That such a factor can] 
found, however, is not evidence for its ul 
tary nature. In fact, when five factors W 
extracted in the 14-test analysis, the le 
first factor broke into two (reasoning-spati 
simultaneous-spatial), whereas the oth 
three remained the same. These findin 
emphasize the point that a factor analys) 
such as that reported in Table 7, cannot di 
cide between two models of cognitive abi 
ties; it can only demonstrate relationship 


Conclusions 


The primary purpose of the presents (ut 
was to examine the relationships that e3 
between the simultaneous-successi 
formation processing model of Das 
(1975) and the more traditional reason 
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-memory, or Level II - Level I, model of 
abilities (e.g., Jensen, 1970). 'The present 
study does not support an identification of 
simultaneous and successive with 
reasoning and memory, Simultaneous pro- 
cessing was confirmed as being primarily 
related to spatial ability, and to a lesser ex- 
tent, to both reasoning and memory. The 
relationship with reasoning was mostly a 
function of one reasoning test that used fig- 
ural stimuli. The relationship with memory 
seemed to reflect the involvement of simul- 
taneous processing in superordinate or 
hierarchical coding. (This latter point will be 
elaborated below.) The PMA memory factor 
was shown to be related to both simulta- 
neous and successive processing, indicating 
that both of these modes of processing can 
result. in improved performance in paired- 
associate tasks. An exploratory factor anal- 
ysis of the 14 tests included in this study 
support the distinctiveness of successive 
processing and memory ability, as well as the 
involvement of simultaneous processing in 
memory ability. It further suggested that 
what has been called Level II ability by 
Jensen can also be seen as a conglomerate of 
aspects of simultaneous processing, induc- 
tive reasoning, and spatial ability. 
Perhaps a simpler way of demonstrating 
the nonidentity of the information-pro- 
cessing and abilities models is to consider 
that both simultaneous and successive pro- 
cessing were shown to be involved in both 
reasoning and memory tasks. Although this 
does not deny the reality of the memory and 
reasoning dimensions, it does suggest that 
these dimensions are not isomorphic with 
the information-processing dimensions. 
The spatial d J) perce ot 
cessing was confirmed, and a 
made previously (Das et al., 1975, pp. ped 
100), that din e was m 
M i i un: a 
olved in coding or ing, supported. 


processing can be we as forms of coding. In 
the case of simultaneous p ee. 
coding is superordinate or hierarchical: A 
unitary code is formed from the is 
case of successive processing, the coding 
sequential or associative and a — 
dependent series of codes is formed. ud 
of coding, both modes of processing 


be susceptible to instruction, encouraging 
the possibility of educational application. 
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Instructional Effects of Discrepancies in Content 
and Organization Between Study Goals 
and Information Sources 


Ernst Z..Rothkopf and Mary E. Koether 
Bell Laboratories 
Murray Hill, New Jersey 


Gagné and Rothkopf observed that study goals are less effective when the se- 
quence of the list of goals does not match the sequence of goal-relevant infor- 
mation in a passage. In their study, some goals could not be fulfilled from the 
information in the experimental passage. Unachievable goals may cause lower 
performance on out-of-order goals. One hypothesis is that students stop hunt- 
ing for out-of-order, goal-relevant information once they find that some goal- 
relevant information is not in the text. This hypothesis was tested in a replica- 
tion of the Gagné and Rothkopf experiment. The replication included condi- 
tions in which all needed, goal-relevant information was in the text. Learning 
of available, goal-relevant information was lower when the study guide and 
text sequences of goals did not match. This effect was observed both for condi- 
tions having complete goal coverage and for conditions of incomplete cover- 


age. However, 


when there was complete goal-relevant information, perfor- 


mance on available, out-of-sequence goals was higher than when some goal- 


relevant information was missing. 


We have been investigating the role of 
learning goals or objectives in learning from 
written discourse. These studies more or less 
follow the Type II incidental learning para- 
digm used by Postman (1964). In a typical 
experiment, subjects are provided with an 
explicit description of learning goals. They 
are given a written passage usually 500-3,000 
words in length and are asked to study it so 
as to learn and remember as much goal-rel- 
evant information as possible. Following 
this, subjects are tested about goal-relevant 
information and also about incidental 
background material. Usually a control 
group is included in the design. Control 
groups are not given learning goals but 
rather are told to learn as much from the 
passage as they can. 

The results of several experiments (Ka- 
plan & Rothkopf, 1974; Rothkopf & Bil- 
lington, 1975; Rothkopf & Kaplan, 1972) 
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indicated that learning goals facilitate to a 
marked degree the acquisition and/or the 
retention of goal-relevant material. Such 
facilitation diminishes to some degree as the 
density of goal-relevant information in the 
text increases. Incidental learning during 
goal-guided study, on the other hand, is 
somewhat depressed compared with that of 
control groups. 1 
Our findings have fairly clear practical 
implications for teachers who give reading 
assignments and who know explicitly what 
they want their students to learn. At the 
same time, our results have raised funda- 
mental questions about underlying psycho- 
logical processes in learning from written 
discourse and about the character of goal- 
ided study. 
E Qus of the unresolved problems, which is 
both of practical and theoretical interest, 
arises from discrepancies between the or- 
ganization of the list of learning goals and the 
organization of the text (Gagné & Rothkopf 
1975). Two kinds of text were used in that 
study. In one, information elements relevant 
to both parts of certain two-part goals were 
adjacent. In the other passage, the infor- 
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mation elements relevant to each of the two 
parts of the learning goals were widely sep- 
arated in the text. Recall was tested after 
reading. Gagné and Rothkopf (1975) re- 
ported that information about the second 
goal elements was remembered better when 
the adjacent-element text was used than in 
the separated condition. The list of learning 
goals used by Gagné and Rothkopf included 
some goal elements that could not be ful- 
filled from the information contained in the 
experimental text. One interpretation of 
their findings was, therefore, that the pres- 
ence of unachievable goals modified the 
subject’s inspection activities so that 
subjects ceased to search for information 
relevant to out-of-order goal elements and 
acted as if the needed information was not 
available in the text. An alternative hy- 
pothesis was that weak performance on the 
second goal-relevant elements was entirely 
due to the distribution of goal-relevant in- 
formation in the text because dispersal of 
topically related information in the text does 
not foster effective memory structures. 
The purpose of the present experiment 
was to clarify the role of unachievable goals 
on the acquisition of dispersed, goal-relevant 
information elements. In order to do so, we 
replicated the original Gagné and Rothkopf 
study (1975), using goal lists that included 
nonachievable learning goals as well as lists 
in which all goals could be achieved by using 
the experimental text. 


Method 


Treatments. Table 1 shows the basic 3 X 2 factorial 
design. ‘Three kinds of goal guidance and two text or- 
ganizations were used. 

The three levels of the goal guidance factor were (a) 
all learning goals attainable from the reading passage, 
(b) some learning goals attainable, and (c) a no-goal 
control condition in which directions were to learn as 
much as possible from the passage. 

: Both experimental goals treatments included a basic 
list of 12 two-element goals. Information relevant to all 
24 basic goal elements was available in the passage. The 
den between is information relevant to the two 

ents of each of these basic i 
through text organization. eb Sah Nas 
treatments also ri 
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Table 1 ^ 
Experimental Design 


Goal guidance Text organizatigl 


All learning goals attainable: 
12 basic two-element goals 
plus 9 additional goal 
elements 


adjacent goal- 
relevant 

sentences; 
dispersed goal- 
relevant senten 


Some learning goals attainable: adjacent goal- 


12 basic two-element goals relevant 

plus 9 additional goal sentences; 
elements plus 7 unattainable dispersed goal- 
goal elements relevant sentene 


No learning goals (control): 
Only directions to learn 
as much as possible about 
material 


adjacent goal- 
relevant. 
sentences; 

dispersed goal- 
relevant senten 


* Then in each condition = 24. 


the information available in the passage. Thus the: 
Goals Attainable group received a total of 33 goal e 
ments and the Some Goals Attainable treatment, a to 
of 40 goal elements. Basic learning goals were int 
mixed with the additional goal elements in a no 
tematic way in lists for both goals treatments. Each 
the 12 basic two-element goals was a statement « 
topic with two elements to be learned about the toj 
The goal elements directed the reader to look for sj 
cific factual information. The goal elements 
designated “first” or “second” depending on tl 
quence in which they were discussed in the passage;! 
sentence in the experimental passage was relev: 
each element in the goal list? except for the seven gi 
elements in the Some Goals Attainable treatment f 
which no relevant information was contained inj 
experimental passage. The control treatment rece 
no list of goals. Subjects in this condition were 6 
given the experimental passage and asked to le! 
much from it as possible. ] 
Text organization factor levels were (a) adjacent 
organization and (b) dispersed text organization. Intl 
adjacent text organization, the two sentences relevé 
to the two elements of each basic goal were adjacent 
the reading passage. In the dispersed text organizati 
the two sentences relevant to each of the basic gf 
were separated by from 10 to 33 sentences in thi 
The structure of the dispersed text passage was: 
same as the adjacent text except that certain sen! 
were moved so that for some goals, the discussion of 
goal element would be separated from the discussio! 
the other element. 1 
Passage. The passage was a 200-sentence, 10: 
description of an imaginary planetary system that] 


! An example of a two-element study goal is “Ta 
moons: Be able to tell how many moons 'Tarran h 
their approximate diameter." 

? Sentences in the passage that were relevant 
two goal elements illustrated in Footnote 1 are thé 
lowing: “Tarran has four moons.” “The moons al 
approximately 200 miles in diameter.” 
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Number of Correctly Recalled Responses to First and Second Elements of Two-part G. 
Goal Guidance Treatments and Text Organization papas pl 


Text organization 
mo Adjacent Dispersed M for 
guidance st 1 Znd 
treatment. M SD M : SD dede 
All learning goals attainable 6.417 3.63 6.083 3.53 6.625 3.12 5.333 2.53 6.114 
Some learning goals attainable 6.167 2.33 5.750 2.57 6417 247 4.250 2.17 5.646 
No learning goals (control) 4.708 216 4.250 2.35 4.667 2.10 4.125 2,07 4.438 
M for text organization 5.764 5,361 5.903 4.569 


been used in a previous study (Gagné & Rothkopf, 
1975). 1 

Test. The test was a 54-item completion test for 
factual information that had been used in the previous 
study (Gagné & Rothkopf, 1975).? Each of the two ele- 
ments of each of the 12 basic goals was covered by one 
test item. Nine additional test items represented the 
nonmanipulated elements in the goals list. They were 
not used in the analysis. Twenty-one test items related 
to incidental material in the passage. Test-item se- 
quence was unsystematically ordered with respect to the 
location of the relevant information in the experimental 
passage. The same test was given to all subjects. 

Procedure. Subjects were assigned at random to the 
experimental conditions and were run in small groups 
of from one to three subjects. Each subject was given a 
large envelope that contained the passage, background 
questionnaire, and test in separate, numbered enve- 
lopes. (For details of this procedure, see Rothkopf and 
Coke, 1968.) Subjects in the experimental conditions 
received a list of study goals in the envelope with the 
passage. After reading the general directions on the 
outer envelope, a subject. progressed through the three 
inner envelopes, in sequence, at his own pace. Subjects 
were directed not to take notes and to record start and 
stop times for the activity called for by each envelope. 
The entire study took approximately 1 hour. 

Subjects. Paid volunteer college undergraduates (N 
= 144) participated in the study. Subjects were both 
men and women. 


Results 


Correct responses on goal-relevant test 
items were scored separately for the first and 
second part of all 12 basic two-part learning 
goals. These results are summarized for the 
three treatments and two organizations of 
the text in Table 2. ’ 

The data were consistent with the findings 
reported by Gagné and Rothkopf (1975). 
The Some Goals Attainable treatment, 
which is a replication of the condition of the 
Gagné and Rothkopf study, resulted in 
higher mean performance on goal-relevant 
items than that of the control group. ‘The use 
of the dispersed text produced weak per- 


formance on the second goal elements, which 
is consistent with earlier findings. These 
were “out of sequence” elements in that text. 
The effects produced by dispersed text on 
second goal elements were less pronounced 
in the All Goals Attainable treatment. 

These conclusions were generally sup- 
ported by a 3 (Goals Treatment) X 2 (Text. 
Organization) X 2 (Goal Elements) analysis 
of variance with repeated measures from the 
same subject on the last factor. The effect of 
goals, F(2, 138) = 6.06, p < .05, goal elements 
sequence, F(1, 138) = 26.59, p < .01, and the 
Text Organization X Element Sequence in- 
teraction, F(1, 138) = 7.64, p € .01, were 
statistically reliable. 

Comparison of the overall mean of the All 
Goals Attainable group with that of the 
control group by Dunnett’s procedure was 
statistically reliable, ¢(138) = 2.385, p < 05. 
The same comparison between the Some 
Goals Attainable group and the control 
group fell short of statistical significance, 
t(138) = 1.718, .05 < p < .10. This was a 
two-tailed comparison. Since the replicabil- 
ity of an earlier finding was being tested, a 
one-tailed test was probably appropriate and 
would be significant at the .05 level. The 
weak overall performance of the Some Goals 
Attainable group was primarily due to low 
levels of correct recall for information from 
the second goal element in the dispersed text 
condition. Second-goal-element items in the 
dispersed text condition were approximately 


SS 

3 An example of the test items pertinent to the two- 
element learning goal illustrated in Footnote 1 
is, “All of Tarran's moons are approximately 
—mile(s) in diameter. Tarran has... moons (how 


many?)." 


70 


Table 3 

Number of Correctly Recalled Incidental 
Items for Goal Guidance Treatments and Text 
Organization 


Text organization goal 
SS a RUIdADce 
Goal guidance Adjacent Dispersed " treat- 


treatment M SD M SD ments 


All learning goals 
attainable 

Some learning 
goals 
attainable 

No learning goals 
(control) 


M for text 
organization 


5.750 3.722 7.083 3.081 6.416 


7.625 3.160 6.583 2.971 7.104 
8.625 4.563 8.333 4.170 8.479 


7.333 7.333 


at the level of the control group. 

Mean performance on second goal ele- 
ments for the All Goals Attainable treatment 
was higher for the adjacent text than the 
dispersed version; however, a t test between 
the two means fell short of statistical sig- 
nificance, £(138) = 1.819, .1 > p > .05. These 
results suggest that discrepancies between 
the sequential organization of learning goals 
and text sequence slightly depressed per- 
formance on out-of-sequence elements, but 
they were not sufficiently reliable to warrant 
a firm conclusion in this matter. 

A comparison of second-goal-elements 
performance in the dispersed text was also 
made between the All Goals Attainable and 
Some Goals Attainable treatments. Mean 
recall for out-of-sequence elements was 
higher for the All Goals Attainable than the 
Some Goals Attainable treatment. The dif- 
ference was reliable, (138) = 2.619, p < .01, 
and supports the conclusion that unattain- 
able goals accentuate the acquisition and/or 
retention problems produced by discrepan- 
cies between sequential organization of goals 
and text. This result therefore favors the 
inspection hypothesis described in the in- 
troduction. 

i Incidental learning. Performance on 
incidental test items is summarized in Table 
3. An analysis of variance indicated a sig- 
nificant goals treatment effect, F (2,138) = 
3.788, p < .05. More incidental information 
was acquired by the control group than by 
either experimental group, although a 
Newman-Keuls test indicated z 
an-^euls test indicated that only the 
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difference between the control and the Aj 
Goals Attainable groups was reliable. Thi; 
difference was largely due to low incidental 
learning resulting from the All Goals At. 
tainable treatment when adjacent text was 
used. Incidental performance under that 
condition was significantly lower than under 
Some Goals Attainable treatment with ad- 
jacent text, ¢(138) = 2.47, p < .05. This 
finding suggests that well-ordered corre. 
spondence in content and organization be- 
tween goals list and text probably diminishes 
inspection of material unrelated to learning 
goals. This reduction in inspection activities, 
in turn, produces lower performance on in- 
cidental background test items. 


Discussion 


The present experiment provides evidence 
that discrepancies in content between study 
goals and text magnify the instructional 
difficulties produced by discrepancies be- 
tween the sequential organization of the 
study goals and the source passage. When 
the text did not contain all the information | 
required by the study goals (the Some Goals 
Attainable group), the acquisition or storage. 
of goal-relevant information that was out of 
sequence in the text was impaired. One 
plausible hypothesis is that this was an ac- | 
quisition phenomenon. Unattainable goals 
and out-of-sequence goal elements até 
functionally equivalent in the sense that the 
subject at certain locations in the text may 
not be able to discriminate between the two! 
conditions. After locating information rele- 
vant to a first goal element in a disperse 
text, the subject finds no information about 
the second goal element in the sentence(s) 
that follows. One strategy that may bè 
adapted here is that the subject reads on it 
the text until information relevant to the 
second goal element is found. For the Some 
Goals Attainable subject this forward search 
will end in failure at times because the text 
does not include all the needed information: 
This should diminish the likelihood that the 
subject will undertake and persist in sear 
for out-of-sequence elements on subseque! 
Occasions. The presence of unattainablé 
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goals thus results in modification of inspec- 
tion activities. The subject begins to act as 
if out-of-sequence goal-relevant information 
was not available in the text. 

Discrepancies in content and sequential 
organization between learning goals and in- 
formation sources is a matter of some prac- 
tical concern. Such discrepancies can be ex- 
pected to occur when teachers supply ex- 
plicit descriptions of learning goals but do 
not specify the text sources from which stu- 
dents may garner the needed information. 
This might occur in library assignments 
when the choice of information source is up 
to the student. Various bookkeeping devices, 
such as check lists for learning goals, may be 
of some use to the student under those cir- 
cumstances. It would be worthwhile to ex- 
plore these in order to take advantage of the 
marked facilitation of learning that explicit 
descriptions of learning objectives provide 
for the student. 
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Measuring Individual Differences 
with an Information-Processing Model 


Wendy M. Yen 
CTB/McGraw-Hill 
Monterey, California 


An information-processing model was fitted to individual learning curves for 
378 students from fifth to tenth grade. Two types of learning materials were 
examined: paired associates and word definitions. Two parameters of the 
model displayed reliable individual differences: acquisition rate and long-term 
retention. Some individual differences also were found in how well students 
learned definitions under experimenter-paced drill versus student-controlled 
independent study. Learning scores based on the definitions material had 
moderately high correlations with traditional aptitude and achievement 
scores; correlations were lower for paired associates. No significant sex or 
Spanish-surname versus white group differences were found for the learning 
scores. Possible classroom applications of the learning scores and parameters 


are discussed. 


A recent review article (Yen, Note 1) ar- 
gued that information-processing models 
might offer a basis for measuring important 
individual differences that could be used in 
the individualization of instruction. Infor- 
mation-processing models typically involve 
three major states or types of memory in 
which information is assumed to reside: the 
unknown state (U), temporary or short-term 
memory (STM), and more permanent or 
long-term memory (LTM). The models hy- 
pothesize ways in which information is 
transferred from state to state as a function 
of structural assumptions of the models and 
model parameters. Parameters carry labels 
such as attention, the probability of trans- 
ferring information from STM to LTM, the 
probability of maintaining information in 
STM when interfering material is presented, 
the probability of retaining information in 
LTM, and so on. Information-processing 
models have been successful in explaining 
many important effects in learning experi- 
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ments, particularly those with respect to the 
distribution of practice (for a review see 
Greeno & Bjork, 1973). 

These models are applied, almost without 
exception, to learning curves based on group 
averages, and little is known about theit 
applicability to individual subjects’ learning 
curves. Atkinson, Brelsford, and Shiffrin 
(1967) estimated parameters of an infor 
mation-processing model, using individual 
curves for nine Stanford students. On the 
basis of a chi-square test, the model would 
have been rejected for all but one of the 
students. Students differed in their param- 
eter estimates, but since different sets 0 
parameters could produce similar fits to the 
same learning curve, the importance of the 
between-students variance was not cleat 
Using the same model as Atkinson et a! 
(1967), Hunt, Frost, and Lunneborg (1913) 
obtained individual parameter estimates fot 
40 college students selected to represent fout 
ability groups classified in terms of high ani 
low quantitative and verbal ability. Some | 
the parameters displayed significant be 
tween-groups differences, indicating th? 
presence of important individual differences 
in parameter estimates. d 

Sperber (1974) simulated a disordin | 
aptitude-treatment interaction that W8 
found in the long-term retention of mode! 


Copyright 1978 5 : 5G 
by the American Psychological Association, Inc. All rights of reproduction in any form reserved. 


72 


INDIVIDUAL DIFFERENCES IN INFORMATION PROCESSING 73 


ately and severely retarded children as a 
function of distribution of practice. This 
simulation was obtained by varying the value 
of an information-processing parameter that 
reflected the probability that an item was 
lost from a “familiar” state when an inter- 
vening item was presented. 

Atkinson (1972) used an information- 
processing model to determine an individu- 
alized computer-generated learning strategy 
(i.e., order in which items were studied), in- 
tended to maximize the rate at which 
subjects placed items in LTM. The com- 
puter-generated strategy was more effective, 
on the average, than strategies chosen by the 
subjects themselves. On the other hand, 
Ciccone and Brelsford (1976) found that 
subject-chosen strategies could be more ef- 
fective than a fixed-pace strategy provided 
by experimenters. 

If information-processing models fit in- 
dividual learning curves and their parameter 
estimates display reliable individual differ- 
ences, then the models could have important 
implications for the individualization of in- 
struction. For example, parameter estimates 
might imply that one student would benefit 
from massed practice in learning new infor- 
mation whereas another student would 
benefit from distributed practice. One stu- 
dent may have trouble encoding information 
into LTM, whereas another student loses 
information from LTM rapidly; the former 
student might benefit from training to im- 
prove encoding (elaboration) skills, and the 
latter might benefit from frequent review of 
material once it is learned. Perhaps some 
students benefit from instructor-provided 
pacing of practice (drill) in learning new in- 
formation and others learn better when they 
are left to judge for themselves how to orga- 
nize their study time and materials. 

The information-processing models hold 
the possibility of providing information 
about aptitude-treatment interactions that 
can be used in individualizing instruction. It 
is also of interest that learning measures 
typically display smaller ethnic group dif- 
ferences than do traditional aptitude tests 
(e.g., Rohwer, Ammon, Suzuki, ,& Levin, 
1971). Direct measures of learning might 
offer a way of obtaining useful information 


about students’ learning skills, information 


that is less influenced by cultural factors 
than is traditional ability information. 

Information-processing models have been 
developed using relatively meaningless in- 
formation, e.g., paired associates. It can be 
argued that models based on the learning of 
such material cannot be relevant to the type 
of learning that goes on in a classroom. Cor- 
relational studies indicate that performance 
on paired associates can have substantial 
relationships with measures of achievement, 
though such relationships are not always 
found (for a review see Yen, Note 1). It may 
or may not be that the information-pro- 
cessing models can be found to fit learning 
scores based on materials that are more 
similar to those learned in the classroom 
than are paired associates. Other potential 
problems in the practical application of in- 
formation-processing models are the 
equipment and amount of testing time nec- 
essary to obtain parameter estimates. Most 
research with the models has been done in 
laboratory settings, with individual testing 
of subjects taking several hours. For practical 
application to be feasible, the testing pro- 
cedure must be simplifiable. 

The present research examines the suit- 
ability of one information-processing model 
(Rumelhart, Note 2) for explaining indi- 
vidual learning curves obtained through 
group testing for students from fifth to tenth 
grade. (This model is a general one, and the 
major aspects of several other models, in- 
cluding some versions of Atkinson and 
Shiffrin's (1968) rehearsal model and Estes’ 
(1955) stimulus-sampling model, can be ex- 
pressed in terms of Rumelhart’s model. In 
his review, Bjork (1970) concluded that 
Rumelhart’s model is one of several models 
that best explain experimental results with 
respect to the distribution of practice.) In the 
present research the model is fit simulta- 
neously to two sets of experimenter-con- 
trolled learning trials, one of which is more 
massed than the other. Student-selected 
learning strategies also are examined. 
Learning is measured with paired associates 
material as well as with material intended to 
be more similar to that learned in the class- 
room. Correlations of the learning scores 
with standard aptitude and achievement 
measures also are obtained. 
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Method 
Materials 


Two types of learning materials were examined: 
sentences and noun-consonant vowel consonant 
(noun-CVC) pairs. 

Sentence materials consisted of items that had two 
parts: (a) questions presented in multiple-choice format 
asking for an unusual word that fit a definition, e.g., 
“Which of these is a vest? A) ariel B) skua C) rampike 
D) jerkin E) kraft" and (b) simple sentence definitions 
of the words, e.g., “A jerkin is a vest.” An effort was 
made to keep the words in the sentence definitions 
(except for the new word to be learned) at or below the 
fourth-grade reading level (Taylor, Frackenpohl, & 
White, 1969). Items were reviewed by black and Span- 

ish-speaking reviewers in an attempt to revise or remove 
items that might be biased or have different meanings 
for different ethnic groups. The items in this study were 
selected from a larger set of items on the basis of ho- 
mogeneity of item difficulty as measured in pilot 
studies. 

"Two forms of the sentences test (designated A and B) 
were used; the forms differed only in the items they 
contained. Items were assigned to forms randomly. 

The noun-CVC pair materials consisted of items that 
had two parts: (a) multiple-choice questions about the 
pairs, e.g., “frog? A) zam B) bim C) mab D) pid E) tek” 
and (b) the noun-CVC pairs, e.g., “frog mab.” Nouns 
were at a fourth-grade reading level or below (Taylor et 
al., 1969). The CVCs were chosen to have association 
values rated from 45 to 86 according to Archer (1960) 
and not to have a significant sex difference in associa- 
tion value. Noun-CVC pairs were made randomly. 
"Those used in the present study were chosen from a 
larger group on the basis of being most homogeneous in 

difficulty as measured in pilot studies. 

Some students were given one or more of the fol- 
lowing tests in addition to the learning materials: Short 
Form Test of Academic Aptitude (SFTAA; 1970), Cal- 
ifornia Achievement Tests, Form A (CAT; 1970), and 
ped i a Tests of Basic Skills, Form S (CTBS; 


Procedure 


Sentence materials were presented under two con- 
ditions: drill and independent learning. Noun-CVC 
pairs were presented under a drill condition. 

Sentence definitions and questions were projected 
on a screen for a class by means of a filmstrip and were 
simultaneously read on a tape recorder; 5 sec were al- 
lowed for the presentation of definitions and 15 sec for 
questions. Students marked their answers on answer 
Sheets that contained question numbers and answer- 
choice designators, e€g,50ABCDE. 

he sentences test consisted of 5 practice items and 
ped oo, The nj items were broken into eight 

; items each (massed groups) and two groups 
eek bre (distributed groups). After the practice 
don students were presented with groups of items 
iios ue: order: 2 massed groups, I distributed 

P, 4 massed groups, 1 distributed group, and 2 
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massed groups. 

In the drill condition a group of sentence items was 
presented in the following fashion. Students were pre- 
sented with a question about the first item in the 
and possible answer choices. After marking their an- 
swers, students were told the correct answer, i.e., pre- 
sented the sentence definition for that item. A question 
and a sentence definition were then presented for each 
of the remaining items in the group. This constituted 
the first trial. Beginning again with the first item in the 
group, students were presented the same question (with 
a different set of distractors), followed by the sentence 
definition for that item. This second presentation of the 
questions and sentence definitions proceeded for all the 
items in the group in the same order as the items were 
presented on the first trial; this second presentation 
constituted the second trial. A total of four trials were 
given for a group of items. Four trials were then given 
for a new group of items. Short breaks were taken be- 
tween groups of items when students appeared to be 
getting tired or restless. Administration of the learning 
trials took about 1.5 hours. Four days later students 
were given a long-term retention (LTR) test, consisting 
of one question (but no sentence definitions) for each 
of the 46 items in order; administration of the LTR test 
took about 20 minutes. 

In the independent learning condition the sentence 
items were presented in the same pattern of massed and 
distributed groups as was done under the drill condition. 
A group of items was administered as follows. Students 
were presented with multiple-choice questions (but no 
sentence definitions) about each of the items in the 
group. This constituted the first trial. Students were 
then told to turn to a page in a study booklet that con- 
tained sentence definitions for all the items in the group. 
The definitions were read to the students, and the stu- 
dents were told that they had a certain amount of time 
to study the definitions on their own. At the end of the 
study period, students were presented with questions 
for each of the items in the group in order; this formed 
the second trial. As much time was spent on each group 
of items under the independent learning condition a 
was spent under the drill condition. Four days later the 
students were given an LTR test, just as was done under 
the drill condition. 

Noun-CVC pairs and questions were projected onā 

screen but were not read to the students; 5 sec were al- 
lowed for the presentation of pairs and 15 sec for ques- 
tions. The noun-CVC pairs test consisted of the same 
number of items as the sentences test, with items 
grouped in the same manner. In the noun-CVC dr 
condition, groups of items were presented in the same 
manner as the sentences drill condition with the eX 
ception that on the first trial students were not ask 
a question about an item before the correct answet 
(noun-CVC pair) was presented. This resulted in fout 
presentations of each noun-CVC pair but only three 
question trials for each item. An LTR test was given fot 
the noun-CVC pairs just as it was for the sentences- 


Subjects 


Subjects for this study attended schools within ? 
50-mile radius of Monterey, California. Testing condi- 


j 


INDIVIDUAL DIFFERENCES IN INFORMATION PROCESSING 


tions were assigned to classes randomly within grade, 
and the sample design appears in Table 1. 

Teachers were asked to identify students’ ethnic 
group membership as Caucasian or white, Spanish 
surname, or other. 


Trial Scores 


Proportion correct trial scores were computed by 
pooling scores for the 24 massed items to obtain massed 
trial scores and by pooling scores for the 22 distributed 
items to obtain distributed trial scores. For the sen- 
tences drill there were 4 trial scores and an LTR score 
obtained for the massed items as well as for the dis- 
tributed items, producing 10 scores. For the noun-CVC 
pair drill there were three trial scores and an LTR score 
obtained for the massed items and for the distributed 
items, producing 8 scores. For the noun-CVC pairs it 
was assumed that if the students had been tested on the 
pairs before they were told the pairs, they would have 
gotten .20 correct, i.e., the probability of guessing cor- 
rectly was 1/5; therefore, scores for the first trial for 
noun-CVC pairs were fixed at .2. 


Model 


Rumelhart’s (Note 2) modified general all-or-none 
forgetting theory (GFT) Markov learning model was 
used in analyzing trial scores under the drill condition. 
The GFT model assumes that items can be in one of 
three states: U, STM, or LTM. It is assumed that items 
begin in either LTM or U for the sentence materials but 
that all items begin in U for the noun-CVC pair mate- 
rials. Between the times that questions are asked about 
a particular item, the item can change states. (In the 
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time between questions about an item, the item’s correct 
answer [sentence definition or noun-CVC pair] is pre- 
sented, and questions and correct answers for each of 
the other [2 or 10] items in the [massed or distributed] 
group are presented.) Items that are unknown can re- 
main unknown or be transferred to STM or LTM. Items 
in STM can remain in STM or be transferred to LTM 
or U. Items that enter LTM are assumed to remain in 
LTM during the learning trials. It is assumed that items 
in STM and LTM always are marked correctly and 
lone in U are marked correctly by lucky guessing 
only. 

A modification of the GFT model is made for pre- 
dictions of LTR performance. It is assumed that during 
the 4 days between the learning trials and the LTR test, 
items can be lost from LTM. 

The modified GFT model has six parameters that are 
used in describing the probabilities that items are in 
particular memory states. 

The k is a parameter that reflects the probability of 
items being known (in LTM) before the testing begins. 
(It is assumed that k = 0 for noun-CVC pairs.) 

The y is a parameter reflecting the probability that 
a student attempts to change the state of an item that 
is in U or STM when the correct answer for that item is 
presented. The y can be interpreted to be an atten- 
tion-level parameter. 

The a is the probability that an item in the unknown 
state enters LTM when the item’s correct answer is 
presented, given that the correct answer is attended to 
when it is presented. The a can be interpreted as skill 
at encoding unknown information into LTM. 

The b is a parameter reflecting the probability that 
an item in STM enters LTM when the item’s correct 
answer is presented, given that the correct answer is 
attended to. The b can be interpreted to be skill at en- 
coding into LTM information that is familiar but that 


Table 1 
Sample Design 
. First Academic Second Achievement 
P: enr aptitude learning ned ) 
Grade classes test test (level) test ove 
5 2 sentences Form A SFTAA (3) wines Form B CTBS (2) 
ill ri 

5 1 beu. Form A SFTAA (3) — CTBS (2) 
Ro = CTBS (2) 

5 2 sentences Form A — 

5 1 uae Fong A. SFTAA (3) pace Pu CTBS (2) 
i t 

5 o roe A ETTAN 2 CTBS (2) 

5 1 er eme der AS SFTAA (3) cm Form B CTBS (2) 
independent learning ncm Fon CTBS (2) 

i ; sonaas ibis STAN independent learning 

7 2 sentences Form À SFTAA (4) Nan oya Form A T 
oo = CAT (5) 

9 2 noun-CVC Form A = 
d SFTAA (5) E E 


noun-CVC Form A 


10 2 
Comprehensive Tests of Basic Skills, Form S; CVC = consonant 


Note. SFTAA = Short Form Test of 
vowel consonant; CAT = California 


‘Academic Aptitude; CTBS = 
‘Achievement Tests, Form A. 
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is not yet in LTM. 

The 6 is the probability that an item in STM is 
maintained in STM when a question and correct answer 
for one other item in the group are presented. The 6t is 
the probability that an item in STM is kept in STM 
when questions and correct answers for all the t other 
items in the group are presented. For mass items t = 2 
and for distributed items t = 10. Unless 0 = 1 or 0 = 0, 

increasing the number of items in a group (distributing 
practice) increases the probability that an item in STM 
is lost from STM in the time intervening between 
questions about that item. The 0 can be interpreted as 
the tendency to keep information in STM. 

The r is the probability that an item in LTM is 
maintained in LTM during the 4 days intervening be- 
tween the end of the learning trials and the LTR test. 


LTM STM U 

LTM| ! 0 0 
Aj-[Stateon STM] by ü-by) 1 — by — (1 — by) 
trial i] U Lay y(1 — a)6t 1-avy — y(1 — a)6t 


Each row of A; corresponds to an item's state on the 
ith trial (when the ith question is asked about that 
item). Each column of A; corresponds to an item's state 
on the (i + 1)th trial (when the (i + 1)th question is 
asked about that item). Each element of A; gives the 
conditional probability of an item being in a certain 
state on the (i + 1)th trial, given that the item is in some 
particular state on trial i. 

The p. is the vector of probabilities of getting an item 
correct, given that the item is in a given state. 


_urMf 1 
° STM| 1 
u l2 


The probability of getting an item correct on the LTR. 
test is assumed to be ps; = p,'A;*A*p, where 


LTM STM U 

LTM|r 0 1-r 
A*=STM]0 0 i} 
U [0o 0 1 


With the assumptions of the GFT model and the six 
parameters, the probabilities of items changing memory 
states are described. For example, ay is the probability 
that an item that is in U on trial i is in LTM on trial i + 
l. The by is the probability that an item that is in STM 
on trial i is in LTM on trial i + 1. If a > b, it is more 
likely that an item in U will enter LTM than that an 
item in STM will enter LTM. Such a situation might 
occur if a student attempts to encode an item into LTM 
only when he or she is clearly aware that the item is 
unknown; a familiar item (an item in STM) is perceived 
EM and no attempt is made to encode it into 
= ae isn a > b, distributed practice is more bene- 
IE uc practice because items are not 
poesia lay d hr with distributed practice 
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The r can be interpreted as skill at retaining info 
tion in LTM. 

In matrix notation the probability of a cori 
sponse on trial i (i = 1, . - , 4) for group size j (j 


massed groups; j 72 for distributed groups) is 
pij = po Aj! pe. 


The po’ is a vector of probabilities that the items 
inate in each state: 

Po’ = (k, 0, 1-k). 

LTM STM U 

The Aj/-! is the matrix A; taken to the (i — 1)th po 
Aj? is an identity matrix. The A; is the matrix of! 
sition probabilities for the movement of an item 
tween trials for group size J: 


[State on trial i + 1] 


If b > a, it is more likely that an item in STM 
enter LTM than that an item in U will enter LTM.4 
might occur if a student attempts to encode into 
only information that appears familiar. When 
massed practice is more beneficial than distri 
practice. Thus, the relative sizes of the values of 
parameters a and b can offer aptitude-treatment 
teraction information, if these parameters and the! 
ference in their values are reliable. ] 
Estimates of the model parameters were ob! 
each student through an iterative minimum ch 
computer program. The parameter estimates min 
the function 


x $ (oj — NjPij)? | 
j= isi Njpi1 — Pij) 


where 0j; is the observed number of correct res 
for trial i and group size j, N; is the number of 
group size J (N1 = 24; No = 22), pi; is the pre 
probability of a correct response for trial i and grot 
j (which isa function of the estimated parameters, 
df is the degrees of freedom of the chi-square sta! 
The df equals the number of independent cells 
chi-square statistic minus the number of indepenc 
parameters estimated. There were 10 independent t 
locally independent; Lord & Novick, 1968) cells f 
sentences materials and 8 independent cells f 
noun-CVC materials. The number of indepent 
rameters estimated varies as a function of the 
being examined. 


Results 


Since proportion correct scores 
necessarily be assumed to be normally t 


tributed, nonparametric significance t 
are used when dealing with small samp 
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Table 2 
Means, Standard Deviations, and Reliabilities 


of Sentences Form A Drill Scores for Fifth- 
Grade Students 


Variable Me SD^ TAA? rap 
Massed items 
Trial 1 .28 .08 23 25 
Tríal 2 .83 5 72 14 
Trial 3 90 A3 A72 67 
Trial 4 94 .09 63 -78 
LTR 54 16 .60 61 
Distributed items 
Trial 1 29 09 08 18 
Trial 2 67 18 72 50 
Trial 3 84 AT -70 54 
Trial 4 88 15 ‘71 .56 
LTR 12 18 15 .63 


Note. LTR = long-term retention. 


Within-form reliabilities (raa) are aug- 
mented correlations between scores based on 
the first half of Form A and the second half 
of Form A; these correlations are augmented 
by the Spearman-Brown formula (Lord & 
Novick, 1968). Between-form reliabilities 
(rap) are correlations between scores on 
Form A and scores on Form B. 


Sentences Under Drill (Fifth-Grade 
Students) 


This section presents means, standard 
deviations, and reliabilities of trial scores and 
parameter estimates for sentences material 
given under the drill condition. Correlations 
of the learning scores with aptitude and 
achievement test scores also are presented. 

Table 2 presents the means, standard 
deviations, and reliabilities of the Form A 
trial and LTR scores. It should be noted that 


Table 3 Ay 
Means, Standard Deviations, 


and Reliabilities of Parameter Estimates for Sentences 


Ti 


the low standard deviations and reliabilities 
of the Trial 1 scores do not mean that these 
scores should not be used, but they indicate 
only that there were no reliable individual 
differences in the scores, i.e., the students 
had about the same amount of prior knowl- 
edge of the vocabulary items. Mean trial 
scores were higher for items in the massed 
groups than in the distributed groups; when 
practice was massed, it was easier to keep an 
item in STM and get the item correct. 
However, mean LTR scores were higher for 
distributed groups than for massed groups, 
t(128) = 15.9, p < .001. 

The LTR scores for the massed and dis- 
tributed groups had a correlation of .72. The 
reliability of the difference between the 
massed and the distributed LTR scores was 
«.00 within form and .33 between forms; 
there were no reliable individual differences 
with respect to how much was learned in the 
massed groups relative to the distributed 
groups. 

Preliminary analyses were made of the fit 
of the learning model to individual students" 
trial scores, The model fit the trial scores of 
virtually all the students, but the resulting 
parameter estimates were all unreliable. It 
appeared that the model was too flexible; 
different sets of parameter values could be 
found that produced very similar predicted 
trial scores. A restriction of the model to 
have a = b was considered; such a restriction 
would require that LTR scores for massed 
and distributed groups of items be the same. 
Since LTR scores typically were lower for 
massed groups than for distributed groups, 
the restriction that a = b did not appear 
appropriate. A restriction that b = 0 implies 


Form A 


Drill Scores for Fifth-Grade Students 
0 k r ay 


Variable a Y 
54 85 
et 20 9 
TAN. .69 «33 
na 103 107 
TAB JT3 „41 
nb 38 41 


^ Sample size for M, SD, and rAA^ Col 
excluded. 
> Sample size for ran. 


rrelations in Table 4 are based on this same sample with missing ol 


n 68 51 

is 09 16 E, 
‘02 39 66 B2 
107 118 103 m 
—.10 31 Al 4 
Al 48 45 45 


bservation pairs 
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Intercorrelations of Parameter Estimates for Sentences Form A Drill Scores for 


Fifth-Grade Students 


LTR 
Variable a Y 6 k r ay Massed Distribu 
a 1.00 49 —.05 .19 27 94 66 
Ni 1.00 —.39 .09 06 -73 45 
0 1.00 —.06 36 —48 —.06 
k 1.00 17 4 .27 
r 1.00 «34 75 
ay 1.00 73 
LTR massed 1.00 


Note. LTR = long-term retention. Correlations are based on the same sample as that for Table 3, and sample sizes appe 


Table 3. 


that LTR scores for massed items must be 
less than or equal to LTR scores for distrib- 
uted items, which is compatible with ob- 
served LTR scores. In obtaining param- 
eter estimates, the constraint was made that 
b=0. 

Examination of the model reveals that in 
some cases parameter values are indeter- 
minate. For example, if 0 = 0, a and y cannot 
be estimated separately, although a^ can be 
estimated. When a student learns very little, 
the parameter r cannot be estimated. The 
model implies that trial scores should be 
“smooth” and nondecreasing. If a student's 
Scores are not compatible with the model, a 
Significant chi-square can be produced 
and/or parameter estimates that appear 
unreasonable can be produced. Formal, 
computerized rules for deciding when pa- 
rameter estimates were determinate and 
"reasonable" were used. Decisions were 
made independently for each testing of each 
student (i.e., results from one testing of a 
student were not considered when examining 
the results from another testing of that stu- 
dent). Only determinate, reasonable pa- 
rameter estimates are used in calculating 


"Table 5 


Means, Standard Deviations, and Correlations of SFTAA Scores with Sentences Form A D 


Scores for Fifth-Grade Students 


Correlation with 


For the sentences Form A, 696 of th 
rameter sets produced chi-square value 
= 5) that were significant at the .05 le 
the learning model is “true,” a chi-sql 
test at the .05 level is expected to rej 
of the parameter sets. Since only 6% 
parameter sets were rejected, it appea 
that the model was suitable for the ob 
trial scores as a group, and a significant 
square was not necessarily taken as ri 
to reject a subject’s parameter estima! 
unreasonable. Parameter sets were rejec 
for a total of 9% of the students beca 
sets were indeterminate or unreaso! 
Tables 3 and 4 contain the means, standi 
deviations, reliabilities, and intercorrelatic 
of the parameter estimates that were 
rejected. 

The parameters a and r are the only 0 
that appear to be reasonably reliable; th 
also are fairly uncorrelated. The para 
ay is more reliable than a or y taken $ 
rately, and its estimate is available for n 
students. In subsequent analyses ay 
were focused upon as parameters of p! 
interest. 


LTR LTR Multiple: 
" Score M SD massed distributed ay r correlation | 
ates 479 64 37 43 e 
ND dez Ter 38 43 ^ EE 5 | 
Elm cv 00 43 50 51 ^ M 5e ay 


Note. Then > 57. SFTAA = 


^ Based on LTR massed, A A Ld codemic Aptitude; LTR = long-term retention. 


ay, andr. 
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Table 6 
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Means, Standard Deviations, and Correlations of CTBS Scores with Sentences Form A Drill 


Scores for Fifth-Grade Students 


Correlations with 


CTBS LTR LTR i 
score M SD massed distributed ay r M 
Reading 475 64 .50 48 Hd 33 
Language 491 58 .50 48 a 33 E: 
Math 458 52 42 42 ‘51 — 30 ‘56 
Total 460 61 ‘51 48 BT 6M ‘64 


Note. The n = 103. CTBS = Comprehensive Tests of Basic Skills, Form 8; LTR = long-term retention. 


a Based on LTR massed, LTR distributed, ay, and r. 


Means and standard deviations of SFTAA 
scores and correlations of these scores with 
learning scores and parameter estimates are 
contained in Table 5. Correlations of learn- 
ing scores with SFTAA scores are moderate. 
Total SFTAA score is .25 of a standard de- 
viation above the national fifth-grade mean. 
The variance of the total SFTAA score is 
61% of the national fifth-grade variance, 
which indicates substantial restriction of 
range in the sample. 

Means and standard deviations of CTBS 
scores and correlations of these scores with 
learning scores appear in Table 6. These 
correlations are moderate in size. Total 
CTBS score is .27 of a standard deviation 
above the national fifth-grade mean. The 
variance of the total CTBS score is 54% of 
the national fifth-grade variance, which 
reemphasizes the restriction of range in the 
sample. 

Multiple correlations of SFTAA with 
CTBS scores ranged from .74 (Math) to .85 
(Total); adding learning scores as predictors 
of CTBS scores increased these multiple 
correlations by at most .02. 

For the Form A distributed groups of 
items, the estimated probability that an 
unknown item was transferred to and re- 
mained in STM, «(1 — a) 6", had a mean of 
.10 and a standard deviation of .09; this in- 
dicated that on the average, few items in the 
distributed groups were placed in and re- 
mained in STM from trial to trial. It was 
decided to estimate ay and r by using only 
distributed groups of items with a model that 
assumed that 0 = 0, i.e., a model that as- 
sumed that no information was kept in STM 
from trial to trial. This is equivalent to using 
a two-state model in which items are either 


unknown or in LTM. This two-state model 
estimates three parameters, (ay)*, r*, and 
k*. By using a chi-square with 2 degrees of 
freedom, the model was rejected at the .05 
level for 296 of 129 students, which indicates 
that the model was reasonable. The ay and 
(ay)* had a correlation of .97; the r and r* 
had a correlation of .82. Reliabilities and 
correlations of the parameters with SFTAA 
and CTBS were altered very little in the al- 
ternative, simplified parameter estimation 
method. These results indicated that test 
administration time could be cut in half 
without substantial loss of information. 


Sentences Under Independent Learning 
(Fifth-Grade Students) 


This section presents means, standard 
deviations, and reliabilities of trial scores for 
sentences material presented under the in- 
dependent learning condition. Correlations 
of these learning scores with aptitude and 
achievement scores also are presented, 


le 7 - d» 
e Standard Deviations, and Reliabilities 
of Sentences Form A Independent Learning 
Scores for Fifth-Grade Students 
Variable M SD* raa” ru 
lite " 

Mira 1 sm 323 10 08 4 
Trial 2 93 Al 78 19 
LTR T 16 59 56 
es it j 

AS ig ve 25 A «o0 -13 
Trial 2 19, 18 59 59 
LTR AT 19 62 e 

Note. LTR * long-term retention. 

5n 259. 

bn = 31. 
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Table 8 

Means, Standard Deviations, and Correlations 
of SFTAA Scores with Sentences Form A 
Independent Learning Scores for Fifth-Grade 
Students 


Correlation with LTR Multiple 


SFTAA Distrib- correla- 
score M SD Massed uted tion* 
Language 444 64 A5 37 AT 
Nonlang- 

uage — 469 63 32 20 32 
Tod 44 0 — M 3 A5 


Note. The n = 79. SPTAA = Short Form Test of Academic 


Aptitude; LTR = long-term retention. 
* Based on LTR massed and LTR distributed. 


Table 7 contains means, standard devia- 
tions, and reliabilities of the Form A inde- 
pendent learning trial and LTR scores for 
those students who were given Form A be- 
fore they were given any other materials, As 
bes the = under jj | ree stu- 

ts on T scores 
for the distributed groups of items than for 
the massed groups, t (78) = 5.4, p < .001. The 
LTR scores for the massed and distributed 
n had a correlation of 57. The 
ty of the difference between the 
massed and distributed LTR scores was .18 
within form and .44 between forms. 

"Table 8 contains means and standard de- 
viations of SPTAA scores and correlations 
of the SFTAA scores with Form A indepen- 
dent learning LTR scores for those students 


Table 9 
Means Standard Deviations, and 
Independent 


WENDY M. YEN 


e Correlations of CTBS Sc ith A 
Í na for Pilth-Grode OaE ‘ores with Sentences Form 


striction of range. Means and standai 
viations of CTBS scores and correlati 
CTBS scores with Form A indepen 
learning LTR scores (for those stud 
given Form A before any other mate 
appear in Table 9. Correlations are mo 
ately high. The mean CTBS total scon 
-14 of a standard deviation above the 
tional fifth-grade mean, and the varian 
CTBS total scores was .82 of the nati 
fifth-grade variance, which indicates a 
restriction of range. 


Sentences Under Drill Versus 
Independent Learning (Fifth-Grade 
Students) 


A comparison was made between 
LTR scores on Form A sentences under 
versus independent learning conditio 


condition were significantly higher | 
mean scores under the independent lean 
condition for both the massed, (1, 20% 
67.3, p € .001, and the distributed, F(1;! 
= 94.5, p < .001, groups of items. A 6 
parison between drill and independ 
learning conditions was also made for. 


B independent learning tests. This c 
Parison was a two-tailed Wilcoxon test ba 


Note Than © 72 CTRS = 


Red em LT mad nad TRC Me Tenta of Banie Skills, Form S; LTR = long-term retention. 
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Table 10 

Means, Standard Deviations, and Reliabilities 

of Noun-CVC Form A Drill Scores for Ninth- 
nd Tenth-Grade Students 


L—————ÉÓÓÓ—————— 
Variable M SD Tas 
Massed items 
Trial 2 93 m m 
Trial 3 96 06 ^ 
Trial 4 -98 M 5 
| LTR 50 9 m 
| Distributed items 
Trial 2 8 16 M 
Trial 3 83 15 B5 
‘Trial 4 $95 M E" 
LTR 69 As 62 
Note. Trial 1 scores are fixed at 2; the m= 8. CVC = qunm 


vowel consonant; L'TR * long-term retention. 


they were tested under the drill condition 
before the independent learning condition 
(p € .001; n = 30). 

The correlation between total LTR scores 
under the drill and the independent learning 
condition was .39 when the former condition 
was used first and .51 when the latter con- 
dition was used first. The reliability of the 
difference between total LTR scores 
the drill and the independent learning com- 
ditions was .39 when the drill condition was 
administered first and .57 when the inde- 
pendent learning condition was adminis- 
tered first. In other worda In ono eee 
was some ence differences 
in the relative benefits of the drill and the 
independent learning conditions. For that 
class, the correlation of SFTAA total score 
with the difference between drill and inde- 
pendent learning total LTR scores was O4. 


Noun-CVC Pairs Under Drill (Ninth: 
and Tenth-Grade Students) 


This section presents the means, standard 


| 
| 
| 


"n 
t 

E 
I? 


di 
f 
j 
i 


ions, ond Reliabitities of Penwmerter Ratimaton jor Nass EVC Form A 


Means, Standard 
Drill Scores for Ninth- and Tenth- Grade Stredents - - 
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Table 12 
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Intercorrelations of Parameter Estimates for Noun-CVC Form A Drill Scores for Ninth- an 


Tenth-Grade Students 


LTR 
Variable a Y 6 r ay Massed Distrib 
a 1,00 494 —.09 21 .99 52 67 — 
m 1.00 =A7 05 a7 08 109 
0 .00 .25 -12 —.18 07 
r 1,00 —.16 .50 .60 
ay 1.00 63 ‘66 
LTR massed 1.00 58 7 


Note. CVC = consonant vowel consonant; LTR = long-term retention. Correlations are based on the same sample as that ft 


11, and sample sizes appear in Table 11. 


and the variance in total SFTAA score is .68 
of the national tenth-grade variance. The 
SF'TAA scores have low positive and nega- 
tive correlations with learning scores, and it 
should be noted that the correlations are 
based on a small sample. 

Table 14 contains means and standard 
deviations of CAT scores and correlations of 
these scores with learning scores. Mean 
reading score is .34 of a standard deviation 
above the national ninth-grade mean, and 
the variance in reading scores is .69 of the 
national ninth-grade reading variance. 
Correlations of CAT scores with learning 
scores are low. 


Sentences and Noun-CVC Pairs Under 
Drill (Seventh-Grade Students) 


This section compares learning scores 
obtained from sentences materials with 
those obtained from noun-CVC materials. 

Tables 15 and 16 contain means, standard 
deviations, within-form reliabilities, and 
intercorrelations of sentences and noun- 
CVC drill learning scores. Comparisons of 
the sentences means in Table 15 with those 
in Table 2 indicate some growth in learning 
skills between Grade 5 and Grade 7. Part of 
this difference between grades in LTR 
means can be attributed to a higher estimate 
of k in the seventh-grade sample (.30) than 
in the fifth-grade sample (.11). Also, total 
SFTAA score was .43 of a standard deviation 
above the national seventh-grade mean for 
this sample and only .25 of a standard de- 
viation above the fifth-grade mean in the 
fifth-grade sample. An analysis of covar- 
lance, controlling for SFTAA total scores, 


revealed no significant growth in seni 
LTR scores for massed, F(1, 98) € 1, oF 
tributed, F(1, 98) < 1, item groups bet 
Grade 5 and Grade 7. d 

Reliabilities and standard deviatior 
the learning scores are fairly similar a 
grades for comparable materials. Con 
tions between learning scores for 
tences and for the noun-CVC mate! 
fairly low, and it should be noted that tl 
correlations are based on relatively sn 
samples. As was the case with the fil 
ninth-, and tenth-grade samples, the 
substantial restriction of range in th 
enth-grade sample; variance in total SF! 
score was 50% of the seventh-grade nati 
variance. 

Correlations of sentences and noun-C 
learning scores with SFTAA scores arec 
tained in Table 17. The LTR scores and 
rameter estimates have higher correlati 
with SFTAA scores for sentences than 
noun-CVC materials. 

Multiple correlations of the two param 
estimates with SFTAA scores for the t 
tences and for the noun-CVC mate 


Table 13 ; 
Means, Standard Deviations and Correlati 
of SFTAA Scores with Noun-C VC Form A 
Drill Scores for Tenth-Grade Students 


SFTAA Correlation with 7 
score M SD Massed _Distribu 
Language 547 91  —36 -42 
Nonlanguage 548 71 122 BL 
Total 53 8 -22 04 


Note. The n = 31. SFTAA = Short Form Test of 
Aptitude; CVC = consonant vowel consonant; LTR = long? 
retention. 
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Table 14 


Means, Standard Deviations, and Correlations of CAT Scores with Noun-CVC Form A Drill 


Scores for Ninth-Grade Students 


Kh sss 


Correlation with 


CAT LTR LTR Multi 
score M SD massed distributed ay r corals 
Reading 590 79 11 34 : 
Math 540 72 11 28 2 » En 
Language 566 61 12 ‘02 Ho -02 25 


Note. The n 2 45. CAT = California Achievement Tests, Form A; CVC = consonant vowel consonant; LTR = long-term 


retention. 
* Based on LTR massed, LTR distributed, ay, and r. 


separately and together appear in Table 18. 
Use of both types of learning materials in the 
prediction substantially increases the mul- 
tiple correlations for the language and non- 
language scores. It should be kept in mind 
that these results and those in Table 17 are 
based on small samples. 


Sex and Ethnic Group Differences 


Two-tailed Mann-Whitney tests for sex 
differences were carried out for the massed 
and the distributed LTR scores. No signifi- 
cant differences were found for the sentences 
drill scores for fifth graders (p € .16 to p < 
.32 based on 67 girls and 62 boys), the sen- 
tences independent learning scores for fifth 
graders (p € .08to p <.74 based on 39 girls 
and 32 boys), or the noun-CVC pair drill 
scores for ninth and tenth graders (p < .22 
to p < .96 based on 48 girls and 50 boys). 

One-tailed Mann-Whitney tests were 
carried out to test a hypothesis of no differ- 
ence versus a hypothesis of the superiority 
of the scores of white to Spanish-surnamed 
children. The fifth-grade students displayed 
no significant ethnic group differences for 
sentences drill massed or distributed LTR 


Table 15 ; 
Means, Standard Deviations, 


scores (p < .09 to p < .16); these students did 
display significant ethnic group differences 
for SFTAA total score (p € .01 based on 14 
Spanish-surnamed and 29 white students) 
and CTBS total score (p < .001 based on 37 
Spanish-surnamed and 79 white students). 
There were no significant ethnic group dif- 
ferences for fifth-grade students for sen- 
tences independent learning LTR scores (p 
< .29 to p <.75), for SFTAA total score (p 
< .09 based on 15 Spanish-surnamed and 28 
white students), or for the CTBS total score 
(p <.14 based on 14 Spanish-surnamed and 
25 white students). There were no significant 
ethnic group differences for ninth- and 
tenth-grade students for noun-CVC pair 
drill LTR scores (p < .42 to p < .45) or 
SFTAA total score (p < .11 based on 10 
Spanish-surnamed and 19 white students). 


Discussion 


Reasonably reliable learning curves suit- 
able for obtaining parameter estimates from 
an information-processing model were ob- 
tained with a test of practical length in a 
group-testing situation. Scores were pro- 


and Reliabilities of Form A Drill Sentences and Noun-CVC LTR 


Scores and Parameter Estimates for Seventh-Grade Students 
Noun-CVC pairs 


Sentences 
LTR r : 
Variable Masse Distribute ay r Massed Distribute ay 
M 68 84 67 18 4 Hi 58 88 
^ f 93 45 3 T : ; 
F 38 30 mo 55 ‘53 70 E 5i 
Es ^46 46 49 43 45 45 [ 


Note. The n is sample size for M, SD, and raw; correlation 


excluded. CVC = consonant vowel consonant; LT) 


s in Table 16 are based on this sample with missing observation pairs 


R = long-term retention. 


| 
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Table 16 4 
Intercorrelations of Form A Drill Sentences and 
Estimates for Seventh-Grade Students 


WENDY M. YEN 


Noun-CVC LTR Scores and Parameter — 


Sentences Noun-CVC pairs 
LTR LTR 
Variable Massed — Distributed ay r Massed Distributed ay 

Senten 

"TR massed 1.00 -16 NI .80 43 35 39 

LTR distributed 1.00 .74 85 44 32 44 

ay 1.00 46 33 27 46 

r 1.00 135 330 40 | 
Noun-CVC 

LTR massed 1.00 -76 64 

LTR distributed 1.00 62 

ay 1.00 


Note. CVC = consonant vowel consonant; LTR = long-term retention, Correlations are based on the same sample as that fol 


15, and sample sizes appear in Table 15. 


duced that displayed no significant sex or 
white-versus-Spanish-surnamed group dif- 
ferences. There was substantial restriction 
of range in the samples tested; reliabilities of 
the learning scores and correlations of the 
learning scores with other measures would 
be expected to increase in more heteroge- 
neous samples. 

The information-processing model was 
successful in fitting individual learning 
curves, particularly those based on the ma- 
terials more similar to classroom material. 
The model did not produce reliable param- 
eter estimates unless restrictions were placed 
on some of the parameters, simplifying the 
model. In its most general form the model 
predicts that massing of practice will facili- 
tate learning if b > a whereas distributing 
practice will facilitate learning if a > b. Since 
there were no reliable individual differences 
in LTR scores for the massed versus dis-. 
tributed trials, it is not surprising that pa- 
rameter estimates were not reliable when a 
and b could both vary freely. 


Table 17 


Correlations of Sentences and Noun- 
Seventh-Grade Students 


CVC Drill Scores with SFTAA Scores for 


When the restriction was made 
0, the parameter a was estimated relii 
but the STM parameter 6 (and functi 
it such as 04’) remained unreliable and 
related to LTR scores; how much infor 
tion a student kept in STM did not 
reliable individual differences and ¥ 
correlated with long-term learning. 
the model was restricted to have 0 = 0 
to have no information enter STM) 
model was fitted only to distributed 
scores, resulting parameter estimates W 
just as reliable and valid as estimates | 
tained with the more complex model ¥ 
both massed and distributed trial sco es 
may be that use of a wider variety of dist 
bution of practice would produce relial 
individual differences in the effects of ¢ 
tribution of practice and STM and wi 
demonstrate the importance of all th 
rameters in the general model. The lai 
mean differences in LTR scores betw 
massed and the distributed practice dé 
indicate that, on the average, keeping @ 


Sentences Noun-CVC pairs 
LTR LTR 
SFTAA Dis- Multiple Dis- Multiple 
score Massed tributed a Y r correlation? Massed tributed a Y r correlation 
Language 46 49 62 .25 
: , 62 . 67 24 Bon | 4 61 
Nonl anguage — 48 46 — 64 44 70 35 4330-07 65 
58 A>) IP) 38 78 33 49 — 49 —44 6T 
Note. us 
BEN n 2.94. CVC = consonant vowel consonant; SFTAA = Short Form Test of Academic Aptitude; LTR = long-term? 


a Based on LTR massed, LTR distributed, ay, and r. 
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Table 18 : 
Multiple Correlations of Sentences and Noun- 


CVC Parameter Estimates with SFTAA 
Scores for Seventh-Grade Students 


Sentences 
Noun- and 
SFTAA CVC  noun-CVC 
score Sentences pairs pairs 
Language 63 120 -12 
Nonlanguage .65 59 13 
Total 71 49 NE 
Note. Multiple correlations are based on ay and r. CVC = 
consonant vowel consonant; SFTAA = Short Form Test of 


Academic Aptitude. The n — 34. 


of information in STM was not an effective 
learning strategy. Thus, the contrast be- 
tween massed and distributed practice did 
not offer aptitude-treatment interaction 
information but did offer treatment infor- 
mation. 

The information-processing model was 
successful in identifying two fairly reliable 
components of information learning: a pa- 
rameter reflecting how much information 
the student encoded into LTM on each trial 
(ay), i.e., acquisition rate, and a parameter 
reflecting retention of information in LTM 
over 4 days (r). These parameter estimates 
could be helpful in individualization of in- 
struction. For example, students with low ay 
estimates might benefit from a teacher's 
spending more time introducing and re- 
peating new information; students with high 
ay estimates would not need such help. 
Students with low estimates of r might 
benefit from frequent review of previously 
learned material, whereas other students 
would not need as much review. (Also, it is of 
interest that there was a low correlation be- 
tween ay and r, which indicates that those 
who acquired information rapidly were not 
necessarily those who retained the infor- 
mation best.) It remains to be seen whether 
such interpretations of the parameters can 
be upheld in classroom use. The fact that the 
learning scores have as high correlations with 
math achievement scores as with reading 
achievement scores does argue for the gen- 
erality of the learning scores. E $ 

There was some evidence of reliable dif- 
ferences in the relative benefits of drill and 
independent learning. Such differences, if 


generalizable to a classroom setting, could be 
useful to teachers in helping to match 
teaching style to student characteristics, i.e., 
providing drill for those who benefit from it 
and providing independent learning for 
others. It is of interest that very few students 
learned better on their own than with ex- 
perimenter-supplied drill and that the in- 
dividual differences in the amount learned 
under the two conditions were not related to 
a traditional aptitude measure, i.e., the most 
“intelligent” students were not necessarily 
those who learned better with independent 
learning. 

Correlational results indicated that the 
sentences and paired associates were mea- 
suring somewhat different learning skills, 
with the former more strongly related to 
achievement than the latter. The learning 
scores had lower correlations with achieve- 
ment than did the traditional aptitude scores 
and did not add notable amounts of ac- 
counted variance to the prediction of 
achievement from aptitude scores. This is 
not surprising in light of the fact that the 
achievement and aptitude tests sample 
heterogeneous and somewhat overlapping 
sets of behaviors and the learning measure 
is intentionally made to focus on one area, 
the learning of new information. The learn- 
ing parameters are valuable because of their 
direct implications for teacher actions, as 
outlined in the two preceding paragraphs. 

Tt is of interest that in the present samples 
there apparently was little growth in learning 
scores from fifth to seventh grade, It may be 
that information-learning skills could be but 
are not taught in the schools. Perhaps stu- 
dents could be taught learning strategies 
such as elaboration or effective organization 
of independent study that would alter their 
learning scores and parameter estimates as 
well as facilitate classroom learning. 


Reference Notes 


. M. Toward practical application of infor- 
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Student Learning and Performance 
Under Varying Conditions of Teacher Verbal 
and Nonverbal Evaluative Communication 


Anita E. Woolfolk 
Douglass College 
Rutgers—The State University 


Four combinations of teacher verbal and nonverbal evaluative behavior were 
studied within a controlled microlesson. Two male and two female teachers 
presented each of the four combinations—(a) verbally and nonverbally posi- 
tive, (b) verbally positive and nonverbally negative, (c) verbally negative and 
nonverbally positive, or (d) verbally and nonverbally negative—to different 
randomly selected samples of sixth-grade male and female subjects. Data 
analysis indicated that teacher negative nonverbal behavior led to significant- 
ly greater performance during the lesson. Teacher verbal behavior also in- 
fluenced subject performance, interacting with the factor of individual teach- 
er. On the measure of learning, females achieved significantly higher scores 
and teacher gender interacted with student gender. 


Previous well-controlled investigation 
of the effects of teacher behavior on student 
learning and performance has failed to ex- 
amine nonverbal dimensions of teacher 
communications. Most researchers have 
assumed that the salient features of 
teacher-student interactions are to be un- 
covered through an analysis of verbal com- 
munication (Galloway, Note 1). Though 
many educators have speculated about the 
significance and influence of nonverbal be- 
havior, little research exists to document its 
impact. 

The present study sought to answer the 
following questions within a well-controlled 
classroom analogue setting: (a) Do teacher 
evaluative verbal statements directed to a 
group of students affect student perfor- 
mance and learning on a task? (b) Does 
teacher nonverbal communication affect 
student learning and performance on the 
same task? (c) Does either student gender or 
teacher gender mediate the influence of 
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ment. ; 
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teacher evaluative behavior on student 
performance or learning? 

Several studies have found that positive 
verbal evaluation from the teacher is related 
to student achievement. These positive 
statements may be such comments as “good” 
or “thank you” following student comments 
(Wright & Nuthall, 1970) or teacher praise 
of student responses (Hughes, 1973; O'Leary 
& O'Leary, 1976). According to Rosenshine 
(1976), the impact of teacher criticism of 
student responses on achievement has been 
inconsistent. Some investigators have found 
that teacher criticism following incorrect 
student answers is positively related to 
achievement (Stallings & Kaskowitz, Note 
2). Others have found positive correlations 
between teacher criticism and student 
achievement only for high socioeconomic 
status students (Brophy & Evertson, Note 
3). Redd, Morris, and Martin (1975) found 
that reprimands were more effective than 
praise in controlling child on-task behavior 
in a non-classroom setting. 

The influence of teacher nonverbal be- 
havior on student achievement and perfor- 
mance is much less clearly defined than the 
role of verbal behavior. Several studies have 
demonstrated that adults, in the role of tu- 
tors, present different nonverbal messages 
to tutees, depending on some characteristic 


ction in any form reserved. 
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of the tutee such as IQ (Chaikin, Sigler, & 
Derlega, 1974) or race (Feldman, Note 4). 
Chaikin et al. found, for example, that tutors 
evidenced more smiles, affirmative head 
nods, forward body leaning, and eye contact 
when they believed their tutees were 
“bright” than when led to believe they were 
“dull.” These researchers recommended that 
the impact on students of this differential 
nonverbal treatment be studied systemati- 
cally. 

One such attempt to study the effects of 
teacher nonverbal behavior found that 
varying voice tones (positive, negative, or 
neutral) on instructions for a task had no 
differential impact on middle-class 5-year- 
olds’ performance on the task. Lower-class 
children, however, performed significantly 
better when the instructor’s voice tone was 
positive (Kashinsky & Wiener, 1969). A 
contradictory finding is presented by Mid- 
dleman (1972). She found that black lower 
socioeconomic children were more produc- 
tive on one of three tasks administered when 
the nonverbal behavior of the teacher was 
negative. No difference was found on any 
task for white children. It is difficult to 
generalize from these studies, however, since 
the teacher in both cases was presented on 
either audio- or videotape. Students had no 
reason to believe the teacher's behavior was 
directed toward them personally. In an at- 
tempt to study response to teacher nonver- 
bal behaviors in a setting that would supply 
increased external validity, Woolfolk and 
Woolfolk (1974) examined the effects on 
Students of teacher evaluative behavior 
during an experimental microlesson. 
Subjects were taught by a female instructor 
whose behavior was programmed to allow 
the verbal and nonverbal inputs of teacher 
communication to be systematically varied 
across conditions. Both verbal and nonverbal 
behavior were found to affect subjects’ per- 
ception of the teacher and attraction for her, 
but verbal behavior had the greater impact 
on students' responses. 

Although the goal of increased external 
validity was achieved with the Woolfolk and 
Woolfolk paradigm, several issues were not 
explored in the initial study. Specifically, the 
influence of teacher verbal and nonverbal 
communications on student learning was not 


ANITA E. WOOLFOLK 


explored. The present study utilized th 
microlesson format developed by Wool 
and Woolfolk to investigate teacher genqy 
and student gender as mediators of thes 
fects of verbal and selected nonverbal co. 
ponents of teacher evaluative communica 
tions on student performance and leam. 
ing. 
The nonverbal behaviors examined in thi 
study were facial pleasantness (smile), al. 
firmative head nod, and tone of voice (mot 
accurately called a paralinguistic behavior) 
The rationale for studying these and m 
other nonverbal behaviors is as follows: Fe 
cial pleasantness and head nod have ben 
found to communicate liking and positive 
evaluation of a recipient (Mehrabian, 1972), 
Other nonverbal behaviors included by 
Mehrabian in this evaluative dimension 
nonverbal communication were exclude 
from the present study because they could 
not be systematically manipulated within 
the limitations of the microlesson task o 
because they had been found in previous 
research to convey different meanings when 
made by female versus male communicators $ 
Voice tone was included because it is a net 
essary concomitant of spoken communio 
tions. 

A frequent finding in research on the 
communication of messages through nor 
verbal channels is that the sex of the recip! 
ent and the communicator affects the de 
coding of the message sent. Rosentl 
Archer, DiMatteo, Koivumaki, and Roget 
(Note 5) reviewed 43 independent studiesd! 
adult and child decoding of nonverbal cues 
In 77% of the studies females were superitt 
in accurately judging messages communi: 
cated by facial expression, body movement 
or voice tone. Sex of communicator has al 
been shown to affect the decoding of i 
consistent adult messages by children. Bi: 
gental, Kaswan, and Love (1970) found th! 
positive messages were discounted in tt 
communications of female speakers if any 
the components of the message (verbal, 1 
cial, or vocal) was negative. This discountilé 
did not occur for male communicators: 

The following hypotheses were developed 
based on the findings described above in the 
areas of teacher praise and criticism, teacht | 
nonverbal behavior, and the effects of seX : 


f 


TEACHER COMMUNICATIONS AND STUDENT PERFORMANCE 89 


communicator and recipient on the decoding 
of a nonverbal message. It was predicted that 
positive verbal statements and positive 
nonverbal communications would lead to 
increased performance and learning. Second, 
it was hypothesized that female students 
would be more sensitive than male students 
to teacher nonverbal behavior. Specifically, 
it was predicted that student sex would in- 
teract with teacher nonverbal behavior such 
that the responses of female students to 
teacher positive nonverbal behavior would 
be more positive than the responses of male 
students and more negative than those of 
male students to teacher negative nonverbal 
behavior. 


Method 


Subjects 


TThe subjects were 128 students randomly selected 
from the entire sixth-grade class of a suburban middle 
school in New Jersey. The school serves a predomi- 
nantly middle-class area near a large state university. 
After the loss of two subjects because of illness, the final 
sample consisted of 62 females and 64 males. 


Experimenters and Teachers 


Experimenters in the present. study were five grad- 
vate students in psychology (three females and two 
males). Experimenters were randomly assigned to 
conditions and teachers. 

Two male and two female undergraduate students in 
teacher education served as teachers in the study. They 
were blind to both the dependent variables and the 
hypotheses being investigated and were paid for their 
participation in the study. 


Design 


Four combinations of verbal and nonverbal evaluative 
communication were presented by each of the four 
teachers. Thus there were 16 cells. Subjects were ran- 
domly assigned within gender groups to the 16 cells such 
that each cell contained four males and four females. 
(Subjects’ absences and schedule changes on the days 
of the study caused some cells to vary from this balance 
of males and females.) Teachers were randomly as- 
signed to cells within each of the four combinations of 
teacher evaluative communication. Each of the four 
teachers presented each combination of verbal and 
nonverbal communication to a different group of eight 
students. Each subject participated in only one session, 
thus being exposed to only one teacher and one combi- 
nation of verbal and nonverbal teacher behavior. , 

The experiment utilized a 2x2x2x2 partially 


nested factorial design. The factors of teacher sex, stu- 
dent sex, teacher verbal evaluative communication, and 
teacher nonverbal evaluative communication were all 
at two levels and crossed. The individual teacher factor 
was nested within teacher sex. 


Microlesson Task 


The experimental manipulation of teacher verbal and 
nonverbal evaluative communications was embedded 
within a vocabulary lesson. The English teachers in the 
subjects' school identified 16 words they believed were 
unknown to a majority of the sixth-grade students. The 
8 words used in the vocabulary lesson were randomly 
chosen from this list of 16 words, The subjects’ task 
during the microlesson was to write as many sentences 
as possible using the words, then recall the correct 
spelling and definition of each word when tested. 


Procedures 


During the 2 weeks prior to the study, the four ex- 
perimental teachers received 15 hours of training in the 
presentation of both positive and negative evaluation 
through verbal and nonverbal channels. To check the 
effectiveness of training, I prepared a videotape on 
which each teacher demonstrated two randomly se- 
lected statements from each experimental condition 
plus two neutral statements. These statements were 
audiotaped and videotaped during a simulated teaching 
situation in which six adults played the role of stu- 
dents. 

Several different checks were completed on the 
teachers’ presentation of nonverbal messages. Five in- 
dependent judges rated the videotape without sound 
on a 13-point scale (+6 indicated friendly, warm, and 
approving; —6 indicated unfriendly, cold, and disap- 
proving). Second, the voice tone of the messages was 
rated by another group of judges using the same 13- 
point scale. The audiotape was passed through a 
band-pass filter to mask speech content for this check. 
Mean ratings for voice tone and picture without sound 
are presented in Table 1. A factorial analysis of variance 
conducted on the ratings showed no significant main 
effect or interaction involving teacher sex, individual 
teacher, or individual rater. The evaluative content of 
the verbal statements was determined by a third group 
of five raters. On the same 13-point scale a mean rating 
was found of +3.73 (SE = .16) for the positive state- 


Table 1 


Judgments of Nonverbal Teacher Behaviors 
Sex of teacher 
Female Male 
Nonverbal behavior M SE M SE 
F) E 

Positive picture 39  .38 3.7 
Negative picture 38 25 41 oe 
Neutral picture 3 34 -1 4 
Positive voice tone ae 2 E of 
Negative voice tone ee Ee eh 


Neutral voice tone E 
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ments and —3.63 for the negative statements (SE = 
BUS 
The experiment was conducted in the subjects’ school 
on 8 consecutive school days during the late morning. 
(Eight days were needed to complete procedures for all 
16 cells. However, each group of subjects participated 
in the experiment only once, on 1 of the 8 days.) In each 
cell, subjects were brought by an experimenter from 
their study hall to a vacant classroom. After being 
seated, the subjects were told by the experimenter that 
they were going to participate in a vocabulary lesson, the 
purpose of which was to investigate how students learn. 
The students were also informed that because an im- 
portant purpose of the lesson was to “find out how kids 
learn without asking any questions at all,” they would 
be prohibited from asking questions during the lesson. 
In this manner teacher communication, other than that 
which was experimentally manipulated, was con- 
trolled. 

The experimenter then introduced the teacher. The 
various teachers wore clothing of equivalent formality, 
in keeping with the norm for the regular teachers in the 
school. Male teachers were introduced as Mr. Ross and 
female teachers as Miss Lee. No other information 
about the teacher was given. 

In every condition the teacher stood in front of the 
subjects, pronounced the first vocabulary word, and 
displayed an 18 X 14 in. (45.7 X 35.5 cm) card showing 
the word printed. She/he then spelled the word and 
used it in two example sentences. After presenting the 
vocabulary word, the teacher instructed the subjects to 
write as many "interesting and original sentences" as 
possible in 2 minutes, using the word. During this 2- 
minute writing period the teacher walked around the 
room ostensibly examining the students' work but not 
interacting with the students in any way. All teacher 
behaviors during the presentation of the word and the 
sentence-writing period were neutral. 

Immediately after each 2-minute work session and 
before the presentation of the next. vocabulary word, the 
teacher rendered a two-sentence evaluation of the 
subjects’ work, The varying of these evaluations across 
conditions was the experimental manipulation. In 
Condition 1, the teacher’s positive verbal statements to 
the subjects (e.g., "You're writing very interesting 
sentences, This must be a smart class.”) were accom. 
panied by the positive nonverbal communications of 
pleasant voice tone, head nod, and smiling face. In 
Condition 2, the same Positive verbal statements were 
accompanied by negative nonverbal behaviors of angry 
voice tone, horizontal movement of head, and frowning 
face. In Condition 3, the teacher gave negative verbal 
evaluations (e.g., "You're not writing very interesting 
sentences. This must not be a smart class.") accompa- 
nied by the positive nonverbal elements of pleasant 
voice tone and so on. Condition 4 contained only nega- 
tive verbal Statements and negative nonverbal com- 
munications, 

The sequence of neutral Presentation of a vocabulary 
word, 2-minute sentence-writing period, and teacher 
evaluation was repeated eight times. Thus in each 
condition the subjects received eight two-sentence 
erations em the teacher. 

n every condition, after the last evaluation, th 
teacher left the room, and the experimenter BIS 
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Table 2 ; 
Means and Standard Deviations for 

Performance and Learning Scores by 
Condition 


Performance 
score 


Condition M SD 

1: Verbal + 25.6 6.6 
Nonverbal + 

2: Verbal + 29.4 6.3 328 
Nonverbal — 

3: Verbal — 19.0 1.6.1 2.7 
Nonverbal + 

4: Verbal — 25.8 9.0 3.0 
Nonverbal — 


Note. Maximum learning score = 8, The + means po 
means negative. 


tered the spelling posttest and collected the 
written by the subjects. 


Dependent Measures 


Performance during the lesson was ass 
termining the total number of sentences written! 
subject in the eight 2-minute sentence-writing 

Students in all four conditions had exactly 16 
to write sentences. As described in the procedi 
teacher did not speak during this time, s 
able to focus completely on the writing tas! 
number of sentences written by each subject wi 
sidered his or her performance score. The differ 
between the subject’s pre- and posttest spelling: 
was the learning score for each subject. 


Results 


Performance 


conducted on the performance scores. 
first four factors were treated as fixed e 
The fifth factor was treated as a random 
fect. Mean performance scores for subj 
in each condition are presented in Tabl 
A significant main effect on performani 
was found for nonverbal behavior, F(1, 
25.83, p <.05. Students experiencing 
negative nonverbal behavior wrote S 
cantly more sentences than students in 
Positive nonverbal conditions (a mean of 
sentences in the negative condition ¢ 
pared with 22.5 in the positive conditio 
significant interaction involving ve 
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Figure 1. Mean performance scores by individual 
teacher and by level of verbal behavior. 


communications and individual teacher was 
found, F(2, 9) = 5.69, p < .01. For three of 
the teachers, positive verbal statements were 
associated with greater performance than 
negative verbal statements. The effect of 
positive and negative verbal statements was 
reversed for the fourth teacher. This inter- 
action is shown in Figure 1. With Tukey’s 
procedure for making multiple comparisons 
among means (Kirk, 1968), the difference 
between the mean performance scores in the 
positive and negative verbal condition for 
each teacher was found to be significant at 
the .05 level. Thus for three teachers, positive 
verbal statements led to significantly greater 
student performance, whereas negative 
verbal statements led to significantly greater 
student performance for the fourth teacher 
(a female). 


Learning 


The five-way analysis of variance de- 
scribed above was conducted on the pretest 
spelling scores. No significant Verbal X 
Nonverbal X Individual Teacher interaction 
was found. Thus it was assumed that the 
subjects in the 16 cells were initially com- 
parable in ability to spell the eight vocabu- 
lary words used in the study. j 

The same five-way analysis of variance 
was conducted on the learning scores. Mean 
learning scores for subjects in each condition 
are also presented in Table 2. A significant 
main effect was found for student gender, 
F(1, 2) = 18.19, p € -05. Females proved to 
be better students of spelling, improving 
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more by the end of the lesson. The mean 
learning score for females was 3.1, and the 
mean for males was 2.8. As depicted in Fig- 
ure 2, a significant Teacher Sex X Student 
Sex interaction was found, F(1, 2) = 23.38, 
p <.05. The best combination (in terms of 
student learning) was female teacher and 
female student. The worst combination was 
female teacher and male student. The mean 
scores of male and female students working 
with male teachers were similar and between 
the two extreme means described above. In 
addition to these two significant effects, 
there was a trend noted. Teacher nonverbal 
behavior tended to affect female students 
more strongly than male students, F(1, 2) = 
8.09, p < .10. The spelling scores of male 
students appeared unaffected by teacher 
nonverbal behavior, whereas females made 
greater gains when teachers were nonver- 
bally negative. 

Tukey’s technique for making multiple 
comparisons among means was used to 
identify significant differences among the 
means in the Teacher Sex X Student Sex 
interaction. The mean learning score of male 
students working with female teachers was 
significantly lower than the mean score of 
female students working with female 
teachers and male students working with 
male teachers. However, the mean learning 
score of male students working with female 
teachers did not differ significantly from the 
mean learning score of female students 
working with male teachers. Male teachers 


€———* female teachers 
x----x mole teachers 


STUDENT SEX 


Figure 2. Mean learning scores by sex of student and by 


sex of teacher. 
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were no more successful with male than with 
female students. But female students 
learned significantly more with female than 
with male teachers. 


Discussion 


In interpreting the results of this study it 
must be kept in mind that a classroom ana- 
logue design was used. Students met in a 
regular classroom and participated in an 
ordinary school task, but the teacher was 
unfamiliar and the exposure to him or her 
was brief. The design of the study necessarily 
limits the immediate generalizing of the 
findings to naturalistic settings until sup- 
porting results are obtained in those settings. 
Still, the following comments point a possi- 
ble direction for future research and theo- 
rizing. 

Perhaps the advice “Don’t smile until 
Christmas” has received empirical support 
in this investigation. Student performance 
rate was significantly higher for teachers who 
were nonverbally negative, regardless of the 
sex of the teacher or the sex of the student. 
This finding is not congruent with Middle- 
man’s (1972) results indicating that non- 
verbal behavior of a videotaped teacher did 
not affect the performance of the white 

middle-class children in her sample. Those 
children were similar in age and background 
to the students in the present study. 

The finding that negative nonverbal be- 
havior is more effective in increasing student 
performance is more in keeping with the 
results of Redd et al. (1975). Subjects in this 
study worked harder for a negative adult 
than for a positive or neutral adult. However, 
in the Redd et al. study, only verbal behavior 
was manipulated systematically. Smiles in 
the positive condition, and presumably 
frowns in the negative condition, accompa- 
nied the verbal statements. The effect of 
nonverbal behavior was not examined sep- 
arately from verbal behavior, 

In the present study, Condition 1 (positive 
verbal-positive nonverbal) and Condition 4 
Medio verbal-negative nonverbal) most 
tive aoe s now 

5, respectively, in the Redd et 
al. study. Yet, in the present study, the mean 
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performance scores for subjects it 
conditions were almost identical. 

The actual difference accounti. 
main effect for nonverbal behavior 
tween Condition 2 and Condition 
Table 2). Positive words accompan 
negative nonverbal behavior was 
effective combination, and negative 
accompanied by positive nonverbal be 
was the least effective. Perhaps the 
nation of verbal and nonverbal be 
presented in Condition 2 representst 
erationalizing of the “firm but fair" te 
"The words are supportive, but the non 
behavior communicates seriousm 
control. It is possible that the teach 
words are negative and nonverbal beh 
are positive is seen as timid, anxious, f 
not confident, or unassertive and, 
not to be taken seriously. 

A recent study by Bugental, Henke 
Whalen (1976) provides some supporti 
above explanation of the ineffectiven 
negative words coupled with posi 
verbal behavior. These researche 
that individuals who expect to be ini 
sources of influence interperson 
municate this expectation by saying 4 
ive statements in an unassertive voi 
On the other hand, individuals who 
themselves as effective sources of in 
speak with an assertive voice tone 
usually saying less assertive statemen 
is possible that saying “These are not 
good sentences” in a warm friendly? 
tone while smiling (Condition 3) co 
cates an apology for being critical o 
pectation that the “assertive” verb 
cism will have little influence on stut 

If Conditions 2 and 3 in this si 
proximate low verbal assertive-high 
assertive and high verbal assertive-low! 
assertive conditions, respectively, thé 
results of this study are similar to th 
Bugental and Love (1975). These in 
tors found that parents who were ine! 
in controlling their children’s behavi 
characterized by high verbal assertive 
accompanied by low-assertive voici 
The audiotape used in this study is ci 
being rated with the Bugental et al. 
method for measuring vocal asserti 

Neither teacher verbal statements 
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teacher nonverbal behavior had a significant 
effect on student learning. Factors in- 
fluencing student learning significantly were 
the gender of teacher and student. Female 
students were better students of spelling. 
This is not surprising since girls tend to excel 
in verbal skills during the age range of the 
subjects in the present study. The impact on 
learning of the sex of the student in relation 
to ui sex of the teacher deserves a closer 
look. 

The Teacher Sex X Student Sex interac- 
tion could be interpreted as meaning that 
female teachers are more effective with fe- 
' male than with male students in the sixth 
grade, at least when teaching a verbal skill. 
The results are not in keeping with findings 
of other studies that examined the effects of 
teacher sex on student achievement (e.g., 
Bennett, 1967; Peterson, 1972). These re- 
searchers were primarily interested in de- 
termining whether the presence of male 
teachers would lead to greater achievement 
for male students. Although male students 
did learn more in the present study with 
male than with female teachers, this con- 
clusion has not been unanimously supported 
by other research (Brophy & Good, 1974). 
The difference found in this study, though 
significant at the .05 level, was not large. 
Male students working with male teachers 
had a mean learning score of 2.88, whereas 
male students working with female teachers 
had a mean of 2.64. 

Some studies have found that student 
achievement is greater when the teacher is 
female (Bennett, 1967; Lahaderne & Cohen, 
Note 6). The advantage of female teachers 
was true only for female students in this 
study. Students learned best, in the situation 
studied, with same-sex teachers. 

The prediction that student sex would 
interact with teacher nonverbal behavior, 
which would indicate that female students 
were more sensitive to nonverbal cues, was 
only partially supported. A trend was noted. 
Females made greater gains in spelling when 
the teacher, regardless of sex, was nonver- 
bally negative. Instead of interacting with 
teacher nonverbal behavior, student sex in- 
teracted with teacher sex as described 
above. 


The results of our earlier investigations 


using the paradigm employed in the present 
study (A. Woolfolk, Garlinsky, & Nicolich, 
1977; R. Woolfolk & A. Woolfolk, 1974) in- 
dicate that differences in teacher nonverbal 
behaviors are perceived by students and in- 
fluence student liking for the teacher and 
willingness to self-disclose to the teacher. 
The findings of this study suggest that 
teacher nonverbal behavior also affects 
student performance on a task supervised by 
the teacher. Taken together with the find- 
ings of Middleman (1972), the results of this 
study indicate that teacher negative non- 
verbal behavior may be more motivating 
than teacher positive nonverbal behavior for 
both low and middle socioeconomic status 
students, at least in the situations sampled 
in the two studies. Future research should 
examine the relation between the nonverbal 
assertiveness of teachers and the teachers’ 
effectiveness in influencing students. 
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Processes Mediating the Relationship Between Cooperating- 


Teacher Behavior and Student-Teacher Classroom 
Performance 


Willis D. Copeland 
University of California, Santa Barbara 


In attempting to explain previously detected relationships between coopera- 
ting-teacher behavior and student-teacher behavior subsequent to micro- 
teaching training, two alternate hypotheses were tested: (a) that the model of 
target skill utilization provided by the cooperating teacher encourages stu- 
dent-teacher skill utilization, and (b) that the classroom ecological system as 
shaped by cooperating-teacher skill utilization supports student-teacher skill 
utilization. Thirty-two credential students were randomly assigned to two lev- 
els of each treatment that reflected the two hypotheses, were provided with 
microteaching training in the target skill, and were subsequently observed 


during classroom teaching. A 
scores supported Hypothesis 


two-way analysis of variance of skill-utilization 
b. Results are discussed in terms of implications 


for teacher education and classroom-based research. 


The current emphasis on competency- 
based teacher education has led in recent 
years to an examination of training methods 
used by teacher-education institutions to 
equip credential candidates with specific 
instructional skills. Microteaching (Allen, 
Note 1) has been adopted by many institu- 
tions as an appropriate training format 
(Johnson, Note 2). Recent research, however, 
has cast doubt on the assumption that par- 
ticipation in a microteaching training se- 


quence will, alone, cause student teachers to 
exhibit target skills in the classroom subse- 


quent to training (Copeland, 1975; Copeland 
& Doyle, 1973; Peterson, 1973; Allen, 
McDonald, & Orme, Note 3; Brashear & 
Davis, Note 4). Copeland (1977) investigated 
whether the failure to exhibit target skills in 
the classroom was due to mere 
tothe systematic effect of other variables. He 
found that the extent to which the coopera- 
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ting teacher actually used the target skill in 
the classroom was associated with the stu- 
dent teacher’s use of the skill subsequent to 
training. This finding was consistent with 
other research reporting similar relation- 
ships between cooperating-teacher and 
student-teacher performance (Mitchell, 

1969; Roberts & Blankenship, Note 5; Sep- 
erson & Joyce, Note 6; Flint, Note 7). 

The effect of the cooperating teacher on 
student-teacher skill utilization may be ex- 
plained in at least two ways: First, the co- 
operating teacher’s influence may be un- 
derstood in terms of social-learning theory 
(Bandura & Walters, 1963). From this per- 
spective it might be hypothesized that the 
cooperating teacher's utilization of the target 
skill would serve as a model of performance 
for the student teacher. Observing the model 
would positively incline the student teacher 
to use, in the classroom, the skill that had 
previously been mastered in the training 
laboratory. 

i Second, the apparent effect of the coop- 
erating teacher on student-teacher skill 
utilization may be understood in terms of 
what Doyle and Ponder (1975) have called 
the ecological system of the classroom, that 
is, “that network of interconnected processes 
and events which impinges upon behavior in 
the teaching environment” (p. 183; see also 
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Bronfenbrenner, 1976). From this perspec- 
tive, skill utilization by a student teacher 
would seem to depend on the degree to which 
the target skill acquired in training is con- 
gruent with the ecology of a particular 
classroom. The cooperating teacher, by using 
the target skill prior to entry by the student 
teacher, shapes the ecological conditions in 
the classroom that would foster and sustain 
skill utilization. On the other hand, attempts 
to utilize a skill that is ecologically incon- 
gruent with a particular classroom would 
most likely fail. 

The intent of the present study was to 
determine which of these hypothesized 
processes—modeling or ecological con- 
gruence—mediates the previously detected 
relationship between cooperating-teacher 
performance and the classroom utilization 
by student teachers of skills acquired during 
microteaching. 


Method 
Subjects 


Thirty-two first-year graduate students who were 
candidates for elementary teaching credentials and who 
were enrolled in the same credential program at the 
University of California, Santa Barbara were used as 
subjects for this investigation. All subjects were assigned 
to teach in Grades 2, 3, 4, or 5 during the spring semes- 
ter, 1976. The group was composed of 23 females and 9 
males with a mean age of 25.2 (SD = 3.16). 


Microteaching Training 


All subjects participated in a 3-week microteaching 
program that closely approximated the Stanford mi- 
croteaching training format (Allen, Ryan, Bush, & 
Cooper, 1969). The subjects were provided with a de- 
scription and film model of the target skill, an oppor- 
tunity to teach an 8-min. lesson using the target skill to 
a group of four elementary school pupils, feedback of 
teaching performance via videotape and direct super- 
vision, and an opportunity to revise and reteach the 
lesson to different pupils. This teach-feedback-reteach 
Cycle was repeated three times in each of three training 
sessions for each subject. The first training session in- 
volved practice in “completeness of communication” 
skills and served to accustom the subjects to the mi- 
croteaching procedure. The second session provided 
Practice in the target skill for this investigation, which 
was asking probing questions. Probing questions are 
defined as teacher questions that are asked following 
a pupil’s verbal statement, are based in the substance 
of that Statement, and are intended to encourage the 
Pupil to go further in the thought expressed in the 
Statement (Allen Ryan, Bush, & Cooper, 1969). The 
Sd microteaching training session provided an op- 
Portunity for practicing the first steps of the teaching 
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strategy developing generalizations (Taba, Durkin, 
Fraenkel, & McNaughton, 1971). This strategy was 
chosen for the third session because its use necessitated 
continued practice of the target skill asking probing 
questions. 


Experimental Procedure 


A 2 X 2 design was used to examine the relationship 
of interest to this study, that is, the relationship between 
the dependent variable, which was the rate of student- 
teacher exhibition of the target skill in the classroom 
subsequent to training, and the two independent vari- 
ables which were the model offered by the cooperating 
teacher and the ecological system of the classroom, 
There were two levels of each of the, two independent 
treatment variables: Variable A—the cooperating 
teacher provided the model that was characterized by 
a high or low rate of exhibition of the target skill, and 
Variable B—the student teacher taught in a classroom 
ecological system that was or was not accustomed to 
teacher exhibition of the target skill. 

In order to establish conditions for controlling the two 
independent treatment variables, specific procedures 
were used. For Variable A, to insure that the subjects 
could teach with and have opportunity to observe a 
cooperating teacher who provided a model of either a 
high or a low rate of exhibition of the target skill, a group 
of sixteen cooperating teachers was identified for each 
of the two levels of the variable. To identify these | 
teachers, 87 experienced elementary school teachers 
who were eligible to work as cooperating teachers with 
credential student subjects were rated according to their 
classroom use of the target skill. To obtain these ratings 
the experienced teachers were asked to submit three 
audiotape recordings of themselves conducting what 
they felt were typical reading-group teaching sessions. 
These recorded sequences were transrecorded in ran- 
dom order onto master tapes, and the number of min- 
utes and seconds of teacher talk was noted. The master 
tapes were then coded by two trained raters. The coding 
process required the raters to count the number of times 
the teacher exhibited the target skill on each tape. Each 
rater received two thirds of the sequences so that one 
third was coded by both raters; this was to verify inter- 
rater reliability. The correlation between raters was 
maintained above r = .85 for this procedure. | 

The scores thus derived, which represented the 
number of times the experienced teachers exhibited the 
target skill during each recorded sequence, were thet 
divided by the number of minutes and seconds o! 
teacher talk present in the sequence. The result repre- 
sented the number of incidents of target skill exhibition 
per minute of teacher talk. This procedure was com 
pleted for each of three recordings for each experien 
teacher. The mean of the three scores thus derived Wa 

computed, yielding for each experienced teacher a mean. 
score of incidents of probing questions asked per minut? 
of teacher talk. Hereafter, this score is referred to as the 
teacher's skill-utilization score. 

Thirty-two experienced teachers with the highest and 
32 with the lowest skill-utilization scores were selec 
from the group of 87 to participate in the investigation: 
Sixteen teachers in the high group and 16 in the low 
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group were then randomly selected to be the coopera- 
ting teachers to whom the 32 subjects were assigned for 
spring-semester student teaching. (The function of the 
other 16 experienced teachers in each group is explained 
in a later section.) In this manner it was possible to 
differentiate the levels of Variable A for student 
teachers in the school setting. In order to increase fur- 
ther this difference between levels of Variable A, the 
experienced teachers who were rated high in skill uti- 
lization participated in a complete microteaching se- 
quence in which they further developed their use of the 
target skill. In addition, these teachers were encouraged 
to make a special effort to use the target skill whenever 
possible in their own teaching, especially during read- 
ing-group instruction. 

Three weeks after the selected experienced teachers 
completed their additional training and 1 week after the 
subjects ended their microteaching, the spring semester 
of student teaching began. Sixteen subjects were ran- 
domly assigned to classrooms belonging to experienced 
teachers who had been rated high and had received 
additional training in the target skill. The remaining 16 
subjects were assigned to experienced teachers who were 
rated low and had received no training. By virtue of this 
assignment, the two levels of Variable A were estab- 
lished, that is, 16 subjects student taught in classrooms 
in which the cooperating teacher provided a model 
characterized by high use of the target skill, and the 
other 16 subjects student taught in classrooms in which 
the cooperating teacher provided a model of low target 
skill use. 

It was also necessary to establish the two levels of 
Variable B, the degree to which the classroom ecological 
system was or was not accustomed to teacher use of the 
target skill, To establish these conditions for Variable 
B, the 32 selected experienced teachers to whom stu- 
dent-teacher subjects had not been assigned were uti- 
lized. Sixteen of these experienced teachers had high 
skill-utilization scores and had received further training 
in use of the target skill. The other 16 had low scores and 
no training. The selection and training of these teachers 
established two distinct classroom systems differen- 
tiated in terms of the degree to which the teacher be- 
havior asking probing questions was a regularly oc- 
curring feature of the ecology. These classrooms were 
not, however, the classrooms in which the subjects were 
student teaching. k 

During the third week of spring student teaching (4 
weeks after the subjects completed microteaching 
training), all subjects were instructed to take 30 min. out 
of their regular student-teaching assignment each day 
and move to alternate classrooms to teach reading 
groups. These alternate classrooms belonged to the 
second group of 32 experienced teachers described 
above. The subjects spent only 30 min. each day in these 
alternate rooms, during which time they were busy 


teaching reading. This time limitation insured that the 
portunity to observe the 


rooms teach. 


high in skill utilization were randomly assign 


reading groups in alternate classr longing t 
experienced teachers who were also rated high in skill 


utilization. This meant that these 8 subjects had had an 
opportunity for 3 weeks to observe a “high” model of 
skill utilization (Variable A) and then taught in a “high” 
ecological system (Variable B). The other 8 subjects who 
taught with “high” cooperating teachers were randomly 
assigned to teach in the classrooms of “low” alternate 
experienced teachers. They therefore had had an op- 
portunity to observe a “high” model but taught in a 
“Jow” classroom. The 16 subjects who had taught with 
“low” cooperating teachers were likewise randomly 
assigned to teach in classrooms of either “high” or “low” 
alternate experienced teachers. 

The rather complicated assignment procedure de- 
scribed above yielded the 2 X 2 design illustrated in 
Table 1. Thirty-two subjects were randomly assigned 
to two levels of two treatment variables so that they (a) 
were able for 3 weeks to observe a model provided by a 
cooperating teacher who had a high or low rate of ex- 
hibition of the target skill and (b) taught in a classroom 
ecological system that was or was not accustomed to 
teacher utilization of the target skill. 


Data Collection 


During the week that the subjects were assigned to 
teach reading groups in the alternate classrooms, they 
were also given the assignment of making audio re- 
cordings of themselves conducting the reading-group 
sessions. Each subject made four 15-min. recordings in 
5 days, no two being made on the same day. The subjects 
made the recordings in response to an assignment given 
in a seminar course, and no reference was made to the 
previously completed microteaching activity. 

It is acknowledged that collecting this field-test data 
in classrooms other than those to which the subjects had 
been assigned for student teaching adds an element of 
uncertainty to the study. It may be that the subjects 
behaved differently while teaching new pupils with 
whom they were not. acquainted than they would have 
behaved had the data been collected while they taught. 
pupils with whom they had worked for the past 3 weeks. 
This manner of data collection was necessary, however, 
in order to control adequately both treatment variables, 
Further, the effects of this procedure, if any, should be 
equally distributed across all groups. 


Data Analysis and Results 


The individual tape recordings for all 
subjects were processed and rated in the 
same manner as were the recordings of the 
experienced teachers described above. All 
correlations between raters for this pro- 
cessing yielded r = .82 or higher. The means 
and standard deviations of the skill-utiliza- 
tion scores for subjects in the four treatment 
groups are presented in Table 1. 

” These skill-utilization scores were sub- 
mitted to a2 X 2 analysis of variance in order 
to determine the effects of the two treatment 
variables and their possible interaction on 
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Mean Skill- Utilization Scores of Subjects Assigned to Two Levels of Each 


Independent Variable 


Cooperating teacher 


Provides model of use of target skill to subject z 
Does not provide model of use of target skill to subject 
Overal 


Note. n = 8 for each group. 


the subjects’ performance. The results of this 
analysis, as summarized in Table 2, indicate 
that Variable B, the classroom ecological 
system, had a significant effect on the 
subjects’ utilization of the target skill. Stu- 
dent teachers who taught in a classroom 
ecological system accustomed to the use of 
the target skill were more likely to utilize the 
target skill than were those who taught in a 
system in which the skill was incongruent. 
Variable A, the model of the target skill 
provided by the cooperating teacher, and the 
interaction of the two independent variables 
did not have significant effects. 


Discussion 


The results of the present investigation 
would seem to support an ecological inter- 
pretation of relationships between cooper- 
ating-teacher behavior and student-teacher 
utilization of skills acquired during micro- 
teaching. Such an interpretation focuses on 
the history of the cooperating teacher’s be- 
havior and the effects of that behavior on the 
interconnected processes and events, called 
by Doyle and Ponder (1975) an “ecological 
system,” that are typical of the classroom. 
From this perspective the cooperating 
teacher’s consistent utilization of the target 
skill in the classroom causes that skill to 
become a functional part of the classroom’s 
ecological system. The pupils, for instance, 
become accustomed to the skill and develop 
appropriate responses to its use. Thus, when 
a student teacher who has completed train- 
Ing in the use of the target skill enters such 
a classroom and attempts to utilize the skill, 
that attempt fits the system. This ecological 


Subject teaches in classroom 


ecological system 
Which is 


accustomed Which is not 
touseofthe accustomed to 

target use of 

skill the target skill 

M SD Overall 

213 157 1.07 NE 1.60 
171 148 94 67 1.33 
1.92 1.00 1.46 


congruence in turn reinforces the student 
teacher's use of the skill, thereby increasing | 
the likelihood that the skill will be utilized 
again. On the other hand, when a student 
teacher attempts to utilize the target skill in 
a classroom where the cooperating teacher 
has not used it and, therefore, where the skill 
is not a part of the ecological system, such 
use is not appropriate for the system as it 
exists and is not reinforced. In fact, ecologi- 
cal incongruence may produce aversive 
consequences that inhibit the use of the skill. 
Thus, though the student teacher has re- 
ceived training in the skill, use of the skill 
declines. 

When results such as those found in the 
present study are reported, the inevitable 
question of generalizability arises. There 
would seem to be at least three dimensions 
of generalizability for this study: subjects, 
setting, and target skill. First, it cannot be 
convincingly argued that the subjects for this 
investigation represent any identifiable 
larger population other than all elementary 
credential students at the University of 
California, Santa Barbara. Their first-year 
graduate status (common in California), 
mean age, and sex may make them some- 


Table 2 

Summary of Analysis of Skill-Utilization 

Scores 

pn o m 
Source dj MS F 


Model of target skill (A) 1 60 .48 
Classroom ecological system 1-768 ^ £1. 


Interaction (A x B) 1 
ni 


*p <.05. 
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what different from credential students in 
other institutions. Still, they were beginning 
student teachers with no previous teaching 
experience and were therefore new to the 
classroom. 

Second, the setting in which this investi- 
gation was located would seem to lend weight 
to assumptions of generalizability. This re- 
search was conducted in standard student- 
teaching settings with real teachers doing 
what real teachers do. The problems of 
generalizing from research to field setting are 
considerably less here then in standard lab- 
oratory-based investigations. 

Third, the target skill for this investiga- 
tion, asking probing questions, may be a 
special case and not representative of all 
teaching skills because its use by the teacher 
is especially dependent on pupil behavior. 
First, the very opportunity to ask a probing 
question depends on a previously occurring 
pupil response. Second, a teacher's tendency 
to use probes has an especially strong impact 
on pupil responses because, when responding 
to a teacher's initial question, a pupil in fact 
contracts to respond to à subsequent probe 
as well. In a low-probe ecology; pupils may 
be especially hesitant to respond to initial 
questions if it is expected that such response 
will lead to unaccustomed further involve- 
ment. It may therefore be argued that high 
dependence on pupil behavior sets the use of 
this skill apart from other teaching behaviors 
that commonly occur in the classroom. On 
the other hand, the skill’s dependence on 

pupil behavior may make asking probing 
questions peculiarly suitable for an investi- 
gation that intends to explore the effect of 
the classroom ecological system on stu- 
dent-teacher behavior. Here we have a 
teaching skill, the successful use of which is 
directly related to the behaviors of pupils in 
the classroom. If those behaviors are shaped 
and systematized by the presence of an 
ecological system, then the effects of that 
system should be detectable in teacher be- 
havior, as was found in the present investi- 
gation; thus, the choice of the skill asking 
probing questions. In order to increase 


confidence in the generalization Bi E 
findi rted here to other subjects, 
ee additional re- 


settings, and teaching skills, 
search will, of course, be necessary. 
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The above interpretation of results 
suggests explanations for other previously 
puzzling discrepancies reported in the re- 
search literature. As stated earlier, a number 
of recent reports have cast doubt on the as- 
sumption that participation in a micro- 
teaching training sequence will of itself cause 
student teachers to exhibit, in their class- 
room subsequent to training, the skills that 
were the target of that training (Copeland, 
1975; Copeland & Doyle, 1973; Peterson, 
1973; Allen et al., Note 3; Brashear & Davis, 
Note 4). On the other hand, Borg (1972) 
tested the Far West Laboratory’s Minicourse 
9, which utilizes a microteaching format to 
train experienced teachers in an in-service 
setting. His follow-up data indicated that 
subjects established and sustained high 
levels of skill utilization subsequent to the 
initial training period. This apparent dis- 
crepancy may be explained in light of the 
present results. The Minicourse 9 program 
commonly requires trainees, who are expe- 
rienced teachers, to practice use of the target 
skills in their own classrooms as part of 
training. Such skill practice may not only 
increase the capability of the teacher to use 
the skill but also modify the structure of the 
classroom ecological system so that by the 
end of training, use of the skill has become 
a normal and accepted component of the 
ecological system. Continued use of the skill 
after training by the experienced teacher 
trainee may therefore be congruent with the 
system. On the other hand, the student 
teacher who has acquired the skill in the 
microteaching laboratory must enter a new 
classroom for student teaching. If the class- 
room’s ecological system has not been 
shaped by a history of the cooperating 
teacher's use of the target skill, the student 
teacher may not receive reinforcement for 
his or her attempts to use the target skill, and 
its utilization may decline. 

The results of the present research not 
only support the assumption of the existence 
of a classroom ecological system but also 
suggest that the effects of such a system may 
be more pervasive than previously expected. 
Previous research (Klein, 1971; Sherman & 
Cormier, 1974) has suggested that deliberate 
modification of specific pupil behavior is 
related to predictable changes in teacher 
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behavior. The present results suggest further 
that general pupil behaviors that are not 
deliberately influenced but are manifested 
as components of the ecological system 
through the natural course of classroom in- 
teraction are equally powerful in affecting 
teacher behavior. If this is true, the persis- 
tence of specific teaching behaviors over 
classrooms and over time identified by 
Hoetker and Ahlbrand (1969) and Bellack, 
Kliebard, Hyman, & Smith (1966) may be 
understood in part as a result of the sus- 
taining influence of classroom ecological 
systems. As new teachers enter the profes- 
sion, their behavior may first be shaped by 
the existing system and then, once shaped, 
may continue to conform to and reinforce 
that system as other newer teachers enter 
and are shaped by it. 

The implications of this line of reasoning 
are compelling for both teacher-training 
institutions who would attempt to matricu- 
late teachers who are trained to utilize a wide 
variety of teaching strategies, some of which 
are not generally found in contemporary 
schools, and for researchers who would seek 
to understand the variables affecting the 
behavior of teachers and students in class- 
rooms. 
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Creativity and Achievement of Second Graders in Open and 
Traditional Classrooms 


Susan G. Forman and James D. McKinney 
University of North Carolina at Chapel Hill 


The Wallach-Kogan Tests of Creativity and three subtests of the Iowa Tests 
of Basic Skills were administered to 129 second-grade students who had con- 
tinuous experience in either open classrooms or traditional classrooms as de- 
fined by the Walberg-Thomas Open Education Scale. Five classes of each type 
were studied in three different school systems. Although students in tradition- 
al classes in one school system had higher fluency scores than those in open 
classes, consistent differences in fluency were not found; no significant group 
effects were found for uniqueness scores. However, students in traditional 
classrooms scored consistently higher than those in open classrooms on mea- 
sures of vocabulary, reading, and mathematics achievement. 


The purpose of this study was to exam- 
ine academic achievement and creativity of 
second-grade students with continuous ex- 
perience in either open or traditional self- 
contained classes. In recent years, a growing 
number of school systems have adopted open 
classroom programs based on the assump- 
tions and practices of the British Infant 
Schools (Kohl, 1970; Rathbone, 1971; Weber, 
1971). Although there are many opinions 
about what constitutes open education, 
several characteristics have been identified 
that distinguish open classrooms from more 
traditional self-contained classrooms 
(Traub, Weiss, Fisher, & Musella, 1972; 
Walberg & Thomas, 1971). At the same time, 
the evaluation of open classrooms has been 
based primarily on testimonials by expo- 
nents as opposed to studies of educational 
outcomes (Katz, 1973). 

Advocates of open education generally 
regard the development of creativity—or the 
ability to deal with information in a pro- 
ductive and innovative fashion—as an edu- 
cational objective of major importance 
(Nyquist, 1971; Rogers, 1970; Spodek, 1971). 
Nevertheless, studies that have compared 
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students in open and traditional classrooms 
on measures of creativity have produced 
equivocal results. Haddon and Lytton (1968) 
found that students who attended schools 
that emphasized “informal, progressive 
teaching” scored higher on divergent pro- 
duction tests than did students who were 
taught by more formal methods. Also, a fol- 
low-up study 4 years later indicated that 
these differences were maintained (Haddon 
& Lytton, 1971). Wilson, Stuckey, and 
Langevin (1972) reported that 11- and 12- 
year-olds who attended an open plan school 
for 6 years were superior to students in a new 
open plan school and those in two traditional 
schools on 5 out of 18 creativity measures. 
More recently, Ramey and Piper (1974) 
tested students in a private open plan school 
and a traditional school with the Torrance 
(1966) Tests of Creative Thinking and found 
that while open school students were supe- 
rior in figural creativity, traditionally taught 
students were superior in verbal creativity. 
However, contradictory results have been 
reported by Ward and Barcher (1975) who 
compared high- and low-IQ elementary 
students in open and traditional classes. 
They found that among high-IQ children, 
subjects in traditional classes scored higher 
on figural creativity tests than did those in 
open classes. Type of classroom was not a 
significant factor in either verbal or figural 
creativity among low-IQ subjects. Finally, 
Wright (1975) found no differences between 


'oduction in any form reserved. 
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fifth-grade students who had been placed in were selected by an independent ob 
sitter open or traditional classes for 3 years who used the Walberg-Thomas (1972 
on the Torrance tests and three measures of Education Scale; groups were match 
personality. chronological age, IQ, racial distributio 
In addition to mixed results from studies socioeconomic level. Also, to assess th 
of creativity, impressive evidence has not erality of findings, classes were selected 
been obtained to suggest that students in three different school systems thai 
open classrooms achieve more or less than predominately suburban, rural, and un 
students in traditional classrooms. Gardner versity communities, 
(1966) reported that conventional classes 
produced higher reading scores at the Infant Method 
School level, but that open classes were su- 
perior at the Junior School level. Two recent 
studies found no consistent differences in 
reading between open and traditional classes 
(Trotta, 1973; Tuckman, Cochran, & Trav- 
ers, Note 1). On the other hand, Wright 
(1975) found differences in favor of tradi- 
tional classes on six of nine achievement variance on chronological age indicated no difference 
measures. Ward and Barcher (1975) found between groups or sexes. Similarly, no significant 
that although open and traditional classes ferences were found for IQ as measured by the Pri 
did not differ in reading achievement for Mau Abilities Test (Thurstone & Thurstone, 18 
low-IQ subj ects, traditional Classds A WOrG ‘ocloeconomic status was measured by the 


p » 4 lingshead (Note 2) scale for occupation of paren 
superior for high-IQ subjects. chi-square test showed that the distribution of soc 
In sum, although the open classroom economic classes was comparable among males andi 


philosophy has become very popular over the males and among open and traditional classes. In. 
past decade, relatively little is known about Ei ud aad by ae E bs aa 
the impact of open classrooms on student the four ecu! ne 
paler and creativity; available studies 
ave yielded equivocal findings. Ward and T 
t 
Barcher (1975) have noted that in a number iia and Procedure 


of studies, relevant subject variables have — All testing was done in the child's school by 
not been controlled adequately and/or the female examines in April and May of 1974. The Wa 
Independent variable had not been defined ea du uo Tests consist of three verbal 
objectively. Therefore, in the pr is asked to mes. In the Instances subtest, the ch 
five ape ad. five traditions ven study is asked to name as many things as he can that belong. 

classrooms to a specific class, for example, things that are round. 
Table 1 


Subject Characteristics 


Subjects 


Subjects were 129 second graders, 66 boy: 
girls, with continuous educational experience in 
open or traditional self-contained classrooms in 


1 Open classes Traditional classes 
Variable Male Male Female 
n 
Chronological age (months) ps ^ Be 
95.41 97.10 

SD 96.97 
IQ 3.34 3.55 3.49 

o 101.69 100.65 100.74 
Rae 14.59 12.38 9.75 

White 21 22 25 
Socioeconomic status " j 2 

T 
Middle i e p H 


Lower middl n 13 
Lower t 


9 9 ‘ 
12 7 18 16 : 
4 5 2 2 ] 
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The Alternate Uses subtest requires the subject to name 
different ways to use a common object such as a news- 
paper or a chair. In the Similarities subtest, the child 
describes different ways two objects are alike, for ex- 
ample, a cat and a mouse. The two figural subtests are 
called Pattern Meanings and Line Meanings. In each, 
the subject is shown an abstract shape or line configu- 
ration and is asked to tell the examiner all the different 
things it might be, for example, “What does it make you 
think of?” 

Each subtest was individually administered following 
the procedures recommended by Wallach and Kogan 
(1965). Subjects were told that the examiner was trying 
to make up some new games for children and that they 
could take as long as they wished with each item. All 
responses were recorded verbatim and each response 
was scored as either appropriate or inappropriate based 
on the agreement of two judges. An inappropriate re- 
sponse was one judged to be inadequate or irrelevant 
given the requirements of the problem. For example, in 
naming alternate uses for an automobile tire, one child 
answered, “Feed it to your dog.” Also, repetitive (per- 
severative) responses were regarded as inappropriate. 

After eliminating all inappropriate responses, fluency 
and uniqueness scores were computed for each subtest. 
The fluency score on a given subtest was the total 
number of appropriate responses summed over items. 
The uniqueness score was the total number of responses 
that appeared only once in the study sample for a given 
item. For example, a unique response to “Name all the 
things that will make noise” was “a copying machine.” 
A more complete description of the creativity subtests 
and scoring procedures can be found in Wallach and 
Kogan (1965). 

In addition to the Wallach-Kogan tests, each child 
was given the Vocabulary, Reading Comprehension, and 
Mathematics Skills subtests of the Iowa Test of Basic 
Skills, Level 8-Form 6 (Hieronymus & Lindquist, 1972) 
and the Primary Mental Abilities Test (Thurstone & 
Thurstone, 1962). These tests were administered to 
groups of 12 to 18 children in separate sessions accord- 
ing to the instructions given in the manuals. 


Educational Environment 


Students were selected from five open classrooms and 
five traditional self-contained classrooms. Each open 


class was housed in an open plan building. Team 


teaching was used; the classroom environment was or- 
interest centers. Open 


ganized according to learning/ 
classes contained either two or three teachers and from 
50 to 75 children, Traditional classes were housed in 
buildings that featured self-contained classrooms with 
one teacher and approximately 26 children. The 
Walberg-Thomas Open Education Scale was used to 
assess physical and instructional characteristics of the 
10 educational settings. This scale provides ratings on 
a 4-point continuum for each of 50 items that measure 
eight major dimensions: instruction, provisioning ng 
learning, diagnosis of learning events, reflective ds ju- 
ation of diagnostic information, humaneness, see! e 
opportunities to promote growth, self-perception o 

teacher, and assumptions about children and the pro- 


cess of learning. 
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Observations were done by a staff member of the 
North Carolina State Department of Public Instruction, 
Division of Research, who had been trained to admin- 
ister the scale as part of a statewide evaluation of Early 
Childhood Education Centers. Each classroom was 
observed for 2 half-day periods before the ratings were 
made. Results of the observation rating scale indicated 
pronounced differences between the open and tradi- 
tional self-contained units. The average total score for 
the five open classrooms was 171.40 (SD = 10.82), while 
that for the five traditional classrooms was 86.0 (SD = 
11.44). In general, these results compare favorably to 
those reported by Evans (1971) in a study of 20 open 
classes in the United States (M = 163.17, SD = 14.08), 
20 open classes in Great Britain (M = 160.80, SD = 
13.07), and 20 traditional classes in the United States 
(M = 117.46, SD = 19.59). Thus, evidence was obtained 
to suggest that open classrooms which were selected for 
this study followed practices that were consistent with 
open education philosophy and were differentiated from 
those selected as traditional. 


Results 


The Wallach-Kogan fluency scores, uni- 
queness scores, and Iowa standard scores 
were analyzed separately by using a 2 X 2 
(Sex X Group) multivariate analysis of 
variance (Triangle Universities Computation 
Center, Note 3). To assess the generality of 
effects across different school systems and 
to provide additional control on subject 
variation, this analysis was repeated for each 
school system with IQ covaried. 


Creativity Measures 


Fluency. Table 2 shows the means and 
standard deviations for the fluency scores on 


Table 2 


Means and Standard Deviations for Fluency 
Scores 
‘Traditional 
Open classes classes 
BS pci al 
Subtest. Male Female Male Female 
Instances 
33.28 26.16 31.88 3153 
$b 19.19 1137 21.90 15.53 
jo or 20.75 1855 1915 2103 
SD 7.88 5.28 6.76 9.20 
pum 2184 2074 2271 25.91 
SD 12.23 6.48 8.90 9.31 
Ed ET 1891 2329 16.00 19.19 
SD 13.04 19.98 5.57 6.77 
ut : mad 1594 1716 18.26 20.84 
SD 7.43 542 6.52 7.60 
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Table 3 ë 
Means and Standard Deviations for 


Uniqueness Scores 


Traditional 
Open classes . classes — 
Subtest Male Female Male Female 
V 
Morgen 3.00 490 341 300 
SD 4.22 6.00 663 340 
ti te U: 
eo Sd 238 223 182 225 
SD 251 245 222 2/6 
Similariti 
"M SEVA 191 126 144 253 
SD 2.96 1.59 1.62 2.13 
tern Meanin 
S sd ou 347 274 329 413 
SD 287 à 228 287 298 
Line Meanings 
M 2.25 228 287 298 
SD 217 183 289 215 


each subtest of the Wallach-Kogan Tests of. 
Creativity. The analysis of these data yielded 
a significant multivariate main effect for 
groups, F(5, 121) = 2.63, p < .03, based on 
Wilks's lambda criterion. The interaction 
and sex effect were not significant. Univar- 
iate analyses for each subtest indicated that 
students in traditional classrooms scored 
higher than those in open classrooms on the 
Line Meanings subtest, F(1, 125) = 6.17, p 
< .01, and tended to score higher on the 
Similarities subtest, F(1, 125) = 3.15, p< 
-08. Multiple comparisons by the Scheffé 
method revealed that girls in traditional 
classrooms were more fluent on Line 
Meanings than were boys in open class- 
rooms. 
The analysis of covariance by school sys- 
tem failed to show uniform effects across 
Systems. Although a significant multivariate 
main effect due to groups was found for the 
suburban system, F(5, 39) = 3.07, p « .02, no 
significant main effects or interactions were 
found for either the rural or university sys- 
tems. Univariate analyses within the sub- 
urban system showed that students in tra- 
ditional classrooms were more fluent on Al- 
ternate Uses, F(1, 43) = 4.04, p « .05, and 
Line Meanings, F(1, 43) = 8.64, p < 01, 
whereas those in open classrooms were more 
fluent on Pattern Meanings, F(1, 43) = 4.84 
p < 03. i 
„Uniqueness. Means and standard de- 
viations for uniqueness scores are shown in 
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'Table 3. The multivariate analysis of vari. 
ance on the five subtests revealed no signif- 
icant main effects; however, the interaction 
approached significance, F(1, 121) = 2.12, p 
< .07. Univariate analyses indicated a sig- 
nificant Group X Sex interaction for the 
Similarities subtest, F(1, 125) = 5.31, p < 
.02. Boys had higher uniqueness scores in 
open classrooms, while girls scored higher in 
traditional classrooms. 

When the data were analyzed by school | 
system, no significant main effects were 
found. However, the interaction was signif- 
icant for the suburban system, F(5, 39) = 
2.63, p < .04, and the univariate analysis for ' 
the Instances subtest indicated that boys in 
traditional classes tended to score higher 
than boys in open classes, whereas girls 
performed better in open classes. Thus, the - 
data failed to show clear-cut differences 
between open and traditional classrooms in 
the ability to produce unique ideas. 


Achievement 


The average standard scores for each 
subtest on the Iowa Tests of Basic Skills are 
given in Table 4. Analysis of these data re- 
vealed significant multivariate main effects 
for both groups, F(3, 123) = 6.72, p < 001, 
and sex, F(3, 123) = 2.91, p < .04. The in- 
teraction was not significant. Students in 
traditional classrooms showed higher 
achievement than those in open classrooms 
on all three measures: reading, F(1, 125) = 
8.56, p < .004; vocabulary, F(1, 125) = 5.16, 
D € .02; and mathematics, F(1, 125) = 19.80, 


Table 4 
Means and Standard Deviations for Iowa Test 
of Basic Skills Standard Scores 


Open classes Traditional classes 


Variable Male Female Male Female 
Reading 

M 50.22 54.32 54.97 64.78 

SD 15.28 1467 15.16 12.85 
Vocabulary 3325 

5125 5781 57.62 63. 

SD 15.17 1503 1244 1285 
Mathematics 

M 47.34 4803 5621 5925 

SD 1696 1313 9.06 10.71 
Composite 

M 45.44 — 49.84 5347 60.75 

SD 17.98 1581 1294 13.10 


(077 727 28 
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p < .001. Girls scored higher than boys on 
the reading, F(1, 125) = 7.53, p € .01, and 
vocabulary, F(1, 125) = 6.18, p < .01, sub- 
tests. 

Also, significant multivariate group effects 
in favor of traditional classes were found for 
all three school systems: rural, F(3, 41) = 
5.73, p < .003; suburban, F(3, 41) = 6.60, p 
« .001; and university, F(3, 26) = 4.28, p < 
01. Univariate analyses indicated that these 
effects were uniform across all three mea- 
sures for two school systems and for two of 
the measures in the third, .05 < p < .001. Sex 
effects in favor of girls were found in two 
systems on reading and vocabulary, .03 < p 
< .05. In the analysis for the university sys- 
tem, a significant multivariate interaction 
indicated that boys tended to score higher in 
traditional classes, while girls did somewhat 
better in open class, F(3, 26) = 3.17, p < .04; 
however, none of the univariate interactions 
or sex effects were significant. 


Discussion 


In studies of this nature in which treat- 
ments are applied to classes of children, it 
can be argued that the analysis should be 
carried out on classroom means or medians 
rather than on individual student scores. 
Therefore, our decision to analyze the data 
as reported above requires some discussion. 
First, subjects were not sampled solely by 
class unit, but were selected only if they (a) 
had been placed in either open or traditional 
classrooms for 2 consecutive years and (b) 
had IQs between 80 and 120. Thus, in the 
present study, the class mean in a given 
variable would not represent the response of 
that class as a whole. Second, in traditional 
classes, students were taught by one teacher 
as a self-contained unit, whereas In open 
classes they were taught by several different 
teachers at the same time. Thus, the class 
mean for an open setting would represent an 
administrative grouping of students rather 
than an instructional unit as was the case In 
traditional settings. Finally, a key assump- 
tion in the present study was that the effects 
of treatments were cumulative over time. An 
analysis of class means would obscure the 


fact that treatments were applied to students 
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in different open or traditional classes in 
previous years. 

Nevertheless, to insure that the require- 
ments for statistical inference had been met 
and to provide additional evidence on the 
generality of the findings, a secondary 
analysis by classroom units was carried out. 
Class means for each of the five traditional 
and five open settings were calculated for 
each sex and a series of 2 X 2 (Sex X Group) 
analyses of variance were run on Iowa stan- 
dard score means and Wallach-Kogan total 
fluency and uniqueness score means. Con- 
sistent with the findings reported above, 
traditional classrooms showed higher 
achievement than open classrooms, F(1, 16) 
= 7.59, p < .01, and girls showed higher 
achievement than boys, F(1, 16) = 18.47, p 
< .001. The average Iowa composite mean 
for traditional classes was 57.45 (SD = 6.33) 
and that for open classes was 47.62 (SD = 
11.81). In general, no significant effects were 
obtained in the analysis of class means on 
creativity scores. 

The findings with respect to achievement 
generally support those previously reported 
by Wright (1975) for fifth-grade students, by 
Ward and Barcher (1975) for high-IQ chil- 
dren, and by Gardner (1966) for children in 
the British Infant Schools. However, it 
should be noted that the average IQ of 
subjects in the present study was comparable 
to that of the low-IQ group studied by Ward 
and Barcher (1975). Since achievement dif- 
ferences between open and traditional 
classes were found for each of three school 
systems when IQ was covaried, it can be 
concluded that the superiority of traditional 
students in the present study was not re- 
stricted to high-IQ children, as was found by 
Ward and Barcher (1975). Also, it should be 
noted that achievement differences were 
found consistently across three different skill 
areas, that is, vocabulary, reading, and 
mathematics. E 

Although the findings with respect to 
measures of creativity are supported by 
er (1975) and Wright (1975), 


Ward and Barch: r £ 
they conflict with those of previous studies 


that have found differences in favor of open 
classrooms (Haddon & Lytton, 1968, 1971; 
Wilson et al., 1972). As Ward and Barcher 
pointed out, one limitation of earlier studies 
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was the failure to operationalize the dis- 
tinction between open and traditional 
classes. Also, an additional criticism of pre- 
vious studies, as well as of this one, is that 
subjects were not assigned to treatments at 
random and/or that premeasures were not 
taken to control for individual differences in 
initial performance. Since both the present 
study and those by Wright (1975) and Ward 
and Barcher (1975) were able to assess dif- 
ferences in the degree of “openness” among 
classrooms and to match groups ona variety 
of relevant subject variables, perhaps greater 
confidence can be placed in the findings. 
However, several other issues should be 
considered in interpreting these results as 
well as those reported by Ward and Barcher 
(1975). First, both Haddon and Lytton 
(1968) and Wilson et al. (1972) used 11- and 
12-year-olds in their studies, Although 
Wright (1975) found no differences at this 
age level, it is possible that an open envi- 
ronment has different effects on creativity 
at different periods of development. Also, 
perhaps the amount of experience in a given 
environment influences creative expression, 
Wilson et al. (1972) reported that students 
who had an open plan school for 6 
years were more creative than those in an- 
aparatima ian school which had been in 
opera! ora ; ly, it is entirel: 
possible that individual differences ene 
tive ex are determined by factors 
environment. For ex- 
research by Schaefer (Note 4) 
9n parent involvement and parent-teacher 
has yielded substantial correla- 
of parent educational beliefs, values, 
Practices with measures of creativity/ 
curiosity and basic skills in kindergarten and 
firev-erade mw 
evertheless, the present study provides 
Additional evidence that questions several 
key assumptions of open education, espe- 
Gally the proposition that young children in 
the primary grades learn best from self-di- 
(Barth, 1971; Weber, 
creativity as measured by 
-ogan tests has been shown to 
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formation base for successful perform 
since the measures emphasize verbal e 
and associative processes. Also, since W. 
lach (1970) has pointed out that li 
known about the relationship betwe 
cific classroom practices and creativi 
search on the nature of student-teach r 
teraction and specific open education pr 
tices may make a greater contribution } 
studies of open and traditional environme 
globally defined. 
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developing awareness of expectations, and establishing familiarity with 
research findings. This edition features new research on teacher effective- 
ness, new sections on individualization and open education, an expanded 
treatment of classroom management incorporating concepts and tech- 
niques of behavior modification, and new material on classroom grouping. 
448 pages; $7.95 (tentative). March 1978. ISBN 0-06-042402-8. 


VARGAS 
BEHAVIORAL PSYCHOLOGY FOR TEACHERS 


An operant framework unifies this comprehensive and challenging text for 
prospective and practicing teachers. Emphasizing information that teachers 
can use daily, the book provides numerous examples of connections be- 
tween behavioral theory and teaching practices. 319 pages; $8.95. Feb- 
ruary 1977. ISBN 0-06-046813-0, Instructor's Manual. 


ANDERSON & FAUST 
EDUCATIONAL PSYCHOLOGY: 
The Science of Instruction and Learning 
uction from a behavioral point of view and 


teaches students to systematically apply research findings to improve in- 
struction in the classroom. Examples of topical coverage include in-depth 
discussions of task analysis, how to increase motivation and memory, and 
the evaluation of concept and principle learning. 510 pages; $11.95/paper 
1973. ISBN 0-06-040277-6. Study Guide: $4.95/paper. Instructors Manual. 


Discusses learning and instr 


DREIKURS et al. 
MAINTAINING SANITY IN THE CLASSROOM: 


Illustrated Teaching Techniques 

338 pages; $9.95/paper. 1971. ISBN 0-06-041758-7. 
DREIKURS 

PSYCHOLOGY IN THE CLASSROOM 


Second Edition 
284 pages; $9.95/paper. 1968. I 


Harper @ Row 


10 EAST 53d ST., NEW YORK, N.Y. 10022 
g To request examination copies, Ker npe Ese 


SBN 0-06-041756-0. 


Lobrano, Dept. 546. Include cot 
and present text. 


Biehler 


applies... 
now more than eve 


Third Edition 

PSYCHOLOGY APPLIED TO TEACHING 
Robert F. Biehler A * 
Just published / about 896 pages / Study Guide and Handbook of Teaching 
Instructors Manual NL 
Like its best-selling predecessors, the new Biehler is full of practical Sug 
tions that show specifically how a teacher can apply concepts and princip 
to promote learning and effectively manage the classroom. Moreover, the!) 
Third Edition retains Biehler's noted use of the mastery approach — focusing 
attention on key points that serve as instructional objectives — ensuring 
students’ grasp and retention of basic information. 


Exciting Innovation 

A singular feature of Psychology Applied to Teaching, Third Edition is 
Biehler's provision of material that enables each student to develop a pi 
sonal "Handbook of Teaching,” a custom-designed guide for on-the-job 


Updated, Expanded Content j 
Substantially rewritten to include new information and trends in psychol 

and education, the new Biehler expands treatment of theories about devel: 
opment and learning. And there's more extensive coverage of exceptional 
children, including information on Federal law PL 94-149. 4 


Also by Biehler 


CHILD DEVELOPMENT: An Introduction 


1976 / 657 pages / Study Guide / Instructor's Manual / Supplement for 
Instructors 


Biehler enables students to master the concepts of child development, tog 
grasp the relationships among major theories, and to apply the material: 
Introducing basic concepts in the first two chapters, he carefully integrates 
them throughout the text. 


Campbell/Ballou 

FORM AND STYLE 

Theses, Reports, Term Papers, Fifth Edition 
William Giles Campbell 

Stephen Vaughan Ballou 

California State University, Fresno 

Now available / about 199 Pages / paper / spiraloound 

This highly successful guide to preparing research papers, formal reports, ang 
theses now offers even More facsimile examples of all forms of usage an 
up-to-date format specifications. Conveniently cross-referenced. 


For adoption consideration, request examinati i m ional Ho tn 
K " al nal Houghto 
lineis ination copies from your regi x 


Houghton Mifflin 


Dallas, TX 75235 Geneva, IL 60134 
i 5 Hopewell, NJ 08525 
Palo Alto, CA 94304 Boston, MA 02107 os 


JUST PUBLISHED! 


Growing Up: 
A Study of Children 


by Janice T. Gibson, 
University of Pittsburgh 


A timely introductory text . . . designed to 


< help all adults who are now involved with 


children, or hope to be soon. . . . teachers, 
of course, but also nurses, social workers, 


| psychologists, and parents. Suitable for basic 


child psychology, developmental psychology, 
and child development courses. 

Unique in several ways, this book is perhaps 
best introduced by the author's conviction that 
you can predict behavior — and help children 
grow up to be well-adjusted adults — only 
when you understand how today's rapidly 
changing world affects them. Includes scores 
of practical examples of how the text material 
may be applied in real life situations, descrip- 
tions of living situations in which children now 
play important roles, and up-to-date informa- 
tion on genetics, family styles, and moral, 
ethical, and social problems. Plus, many prac- 
tical learning aids. (02914) 544 pp, 1978 
Coordinated supplements include separate 
Book of Readings (02915) 272 pp, Instructor's 
Manual (02916) 224 pp, and Student Study Guide 


(02917) 192 pp. 


A 
vv 
Social Sciences & H umanities Division 
Addison-Wesley Publishing Company, Inc. 
Reading, Massachusetts 01867 


- 


COMING IN MARCH: 
A NEW TEXT THAT'S WORTH WAITING FOR 


EDUCATIONAL PSYCHOLOGY 
An Introduction 


Steven V. Owen H. Parker Blount Henry Moscow 
University of Connecticut Georgia State University 


EDUCATIONAL PSYCHOLOGY is a thorough, manageable new text for the introductory 
course. Its balanced perspective on learning and teaching provides a broad theoretical 
base for students to develop a personal approach to the classroom. 


EDUCATIONAL PSYCHOLOGY is Jogically organized and presents in sequence the 
characteristics of learners, the theories of learning, the processes of adapting teaching 
to the needs of learners, and the measurements of learning outcomes. This /earner- 
learning-teaching approach is the basis for effective application of theories and princi- 
ples to classroom situations. 


A consistent, lively writing style—blending principles, examples, and anecdotes— 
makes the reading interesting and the development of concepts easy to follow. A wealth 
of specific applications to teaching are made throughout the text, enabling students to 
Put to use the diverse ideas and techniques of the field's leading theorists. 


Over 400 photographs and line drawings make up the entirely original art program—an 
unparalleled teaching tool in itself. The photographs, all taken in actual classrooms, 
make teaching come alive by showing teachers in action. 


The text features a rich and varied pedagogical program, including chapter outlines and 
summaries, end-of-chapter study guides and self-contained boxes of supplemental 
material that enliven and expand upon textual discussions. A useful Instructor's Manual 
containing test questions and lecture/discussion topics is available. 


Paper approx.576pages March 1978 $11.95 tent. 


COLLEGE 
LITTLE, BROWN AND COMPANY 


34 Beacon Street, Boston, Massachusetts 02106 


| 


| OF ADOLESCENCE 


MACMILLAN 


NEW 


NEW 


STATING OBJECTIVES MEASUREMENT AND 

FOR CLASSROOM EVALUATION 

INSTRUCTION IN THE SCHOOLS 

| Second Edition Second Edition 

Norman E. Gronlund, Louis J. Karmel, The University of 
North Carolina; 


University of Illinois, Urbana 


1978 80 pages (approx. ) paper Marylin O. Karmel, High Point College 


1978 512 pages (approx.) 


NEW 


By Mollie S. Smart 
and Russell C. Smart: 
INFANTS: 


NEW 


COUNSELING PROCESS 
AND PROCEDURES 


Fey sie Development and Relationships 
James C. Hansen, State University of Second Edition 
New York at Buffalo} — 1978 446 pages (approx. paper 
1978 480 pages (approx. ) paper 
PRESCHOOL CHILDREN: 
Development and Relationships 
Second Edition 


1978 400 pages (approx) paper 
SCHOOL-AGE CHILDREN: 


Development and Relationships 
Second Edition 


1978 386 pages (approx) paper 
ADOLESCENTS: 


Development and Relationships 
Second Edition 


NEW 
THE PSYCHOLOGY 


Third Edition 

Arthur T. Jersild, Professor Emeritus, 
Columbia University; 

Judith S. Brook, Columbia University; 


David W. Brook, 
Mount Sinai School of Medicine 
1978 576 pages (approx) 


with Laura S. Smart 
1978 352 pages (approx) paper 


BEEN 


a 
a 


NEW 


AN INTRODUCTION 
TO EDUCATIONAL 
RESEARCH 
Fourth Edition 
Robert M. W. Travers, 
Western Michigan University 
1978 448 pages (approx.) 


RECENT PUBLICATIONS 
REL IVATIONS — 


INTRODUCTORY 
STATISTICS: 
A Decision Map 
Second Edition 
Thad R. Harshbarger, 
City University of the 
City College of New York 
1977 548 pages 


FUNDAMENTALS 
AND APPLICATIONS 
OF LEARNING 
Melvin H. Marx, 
University of Missouri, Columbia; 
Marion E. Bunch, 
Washington University, St. Louis 
1977 524 pages 


CHILDREN: 
Development and Relationships 
Third Edition 
Mollie S. Smart and 
Russell C. Smart 
1977 705 pages 


READINGS IN CHILD 
DEVELOPMENT AND 
RELATIONSHIPS 


Second Edition 
Russell C. Smart and 
Mollie S. Smart 
1977 477pages paper 


ESSENTIALS 
OF LEARNING 


Robert M. W. Travers, í 
Western Michigan University 
1977 544 pages 


Fourth Edition 


Macmillan Publishing Co., Inc. 


866 Third Avenue e 


New York, New York 10022 


-— uu  ————— 


Volume 49 


6 issues 
768 pages 
Index 
Beginning 
dan. 1978 


Receive a sample issue from the 1977 volume free with a new 
subscription to the 1978 volume. 


Richard M. Fenker The Incentive Structure of a University—No.4 (Jul.) 


Stephen A. Hoenack and William C. Weiler A Comparison of Effects of 
Personnel and Enrollment Policies on the Size and Composition of a 
University’s Faculty—No.4 (Jul.) 


Rodney T. Hartnett and John A. Centra The Effects of Academic 
Departments on Student Learning—No.5 (Sep.) 


Carol M. Santa and Joan N. Burstyn Complexity as an Impediment to 
Learning: A Study of Changes in Selected College Textbooks—No.5 (Sep.) 


Ernest T. Pascarella and Patrick T. Terenzini Patterns of Student-Faculty 
Informal Interaction beyond the Classroom and Voluntary Freshman 
Attrition—No.5 (Sep.) 


William Hamovitch and Richard D. Morgenstern Children and the 
Productivity of Academic Women—No.6 (Nov.) 


Douglas H. Heath Academic Predictors of Adult Maturity and 
Competency—No. 6 (Nov.) 


Manuscripts: Robert J. Silverman, Editor, JHE, Ohio State University Press, 
2070 Neil Avenue, Columbus, Ohio 43210 


The Journal of — 
Higher Education 


Institutions $16 CJ Individuals $14 
= d C] Students $10 


ean 50. Please mail with check or purchase order to: JHE, Ohio 
Neil Ave., Columbus, Ohio 43210 


Send sample issue No. (no add. charge) 
Name 
Address 
State ZIP 


City 


he new classroom- 
"tested tort that 
deals with Practical 


PSYCHOLOGY AND TEACHING 
Joseph Morris, California State University, Northridge 


In the only educational Psychology text to apply the humanistic perspective throughout, the 
author explains to prospective teachers that the concepts of human potential and growth as 
developed by Rogers and Maslow are eminently applicable to learning in today's c lassroom. f 
In support, there are more than 550 references made to available research— more than half of 
them citing work done after 1970, making this a compellingly contemporary book, With its 
tocus on the daily realities of life in the classroom, it covers the psychology of students’ 
cheating, sl yness, belligerence, use of drugs and alcohol, Obscenity, and disrespect — and 
offers specific, practical methods of dealing with them, Some of the unique content includes 
the place of feelings in education; activity cycles of students; sex, race, and social class; 
modifying the classroom environment; and issues of personal Space in the classroom. Pre- 
publication c lass-testing With students and practicing teachers at universities and school 
districts across the nation has already established its efficacy, Chapter Overviews, Chapter 
Summaries, Annotated Chapter Bibliographies, Full Index and Bibliography at the end of the 
text comprise the pedagogical "package" that rounds Out this superlative and singularly 
significant new work Instructor's Manual available, 


Random House/March 1978/600 Pages paperbound/ Order Codes: 31797, 32195 (1.M.) 


For more information write to. 


Random House 


College Review Desk 179, 400 Hahn Rd., Westminster, Md. 21157 
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: 
CHARLES C THOMAS + PUBLISHER 


THEORIES AND METHODS OF GROUP COUNSELING IN THE SCHOOLS (2nd Ed.) 
edited by George M. Gazda, Univ. of Georgia, Athens. (9 Contributors) The primary purpose of 
this book is to provide the student of group counseling and the practicing group counselor with 
a theoretically sound and comprehensive method of group counseling for the school setting. The 
text provides information on all of the basic theoretical group counseling models. Chapters are 
included on behavioral group counseling, family group consultation, developmental processes in 
group counseling, child drama for group guidance, counseling adolescents in groups, group- 
centered counseling, and Adlerian group counseling. Each chapter discusses the salient points of 

| its particular approach, including theoretical foundations of the position, goals of the treatment, 
counselors’ and clients’ roles, group composition, limitations, sample protocols, ethical consider- 
ations, and suggested readings. 


This Second Edition includes the contributions of most of the significant personnel to group 
: counseling in educational settings. The contributors were selected for their expertise with à 
particular type of clientele, thus allowing for coverage of a broad range of group counseling 
| situations. Their selection was based on the fact that their positions fit the developmental ratio- 
nale of the editor, namely, that group counseling methods, and to some extent theory, must vary 
with the age level and maturity of the counselees. For this reason, play action techniques with 
mixed-sex groups are emphasized for the pre-school and early-school child, while for the preado- 
| lescent a different level of play, more appropriate to the age level, is involved and homogeneous 
grouping with regard to gender is recommended. Interview group counseling is the treatment of 
choice for the adolescent and adult. Variations to this approach are described, especially for 


clients who are on the borderline of various age groups. 


In planning this volume, the editor has considered the needs of the practicing school counselor as 
well as those of the graduate student of group counseling methods and theory. It will serve 


| admirably both as a reference and a textbook. ^76, 384 pp., 4 il, 3 tables, $15.75 


BEHAVIOR THERAPY IN CLINICAL INFORMAL DIAGNOSTIC ASSESSMENT 
OF CHILDREN by Theodore S. Fremont, 


PRACTICE: Decision Making, Procedure and h 1 Y 
Outcome by Ernest G. Pon McGill Univ., Wichita State Univ.; David M. Seifert, Wichita 
Montreal, Quebec, Canada. Foreword by H. J. Guidance Center; and John ET bom 
Eysenck. Practical applications of social con- Wichita State Univ.; all of Wiel Roua, 
ditioning principles are presented in this text. Information and diagnostic Er "e I 
Divided into four sections, the text explores presented for recogüsing, Wit ice pe ec 
specific fears, physical expressions of social formal testing tools, an ey Pel ji 
withdrawal, socially disapproved behavior, riencing. psychological m e: fot 
and current problems in behavior modifica- icits, The behaviors, and Mel ra 
tion. Case studies illustrate the theoretical, developmental apacion 4 d T yide 
clinical and technological decisions made in learning arl anc 2 aiat. 11.75 
selecting therapeutic strategies. '77, 204 pp., 4 are given. 77, pp. » 

u MUSIC THERAPY: An Mceincion 4 

TIVE REINFORCE- Therapy and Special Education Throug 

MENTO si diae Behavior Modifica- Music (2nd Pig.) by Donald E. Michel, Texas 
; Woman's Univ., Denton. An overview of the 


A $ li 
tion bi aee Fave oen field, basic rationales for the use of music 


Center, Morganton 5 T Pr ; , 

handbook syscestatically surveys the entire therapy. _ and gntormtation, M music 

process of behavior modification. Initial chap- therapy d P problems in athe bet. 
i : as cal/ psyc! : T 

ters explain preoperational procedura ye has are presented in this book. The content 


Perce inition of target be- ^ ional line: 
the specificato ge NE deals the is organized along the conventional lines of 
havior. The remainder of t human age categories to aid those clinical and 


then i ification, ai A 
actual oe of wea special education facilities which are set up 
oie T. jd Poss and complex behavior along an age/ developmental basis. '77, 152 
ment, token sys ^ 1 il, 1 table, $9.75 pp. 2 il., $9.75 


training. 77, 288 ppo 


— s Orders with remitta 
301-327 East Lawrence Avenue . 


nce sent, on approval, postpaid e 


Springfield © Ilinois © 62717 


duate Study 
cipal 


for 1978-1979 


An indispensable 

resource book for 
prospective 

€ graduate students 
and for college 

counselors 


admission requirements, degree requirements, tuition, financial 
assistance, internships and 


minority considerations, 

It also furnishes general 
information on how to apply to 
graduate school and for the 
first time includes a section of 
Statistical data relevant to 
trends in graduate training. 


American Psychological Auc 
S Order Department 1200 17th 
Washington, D.C. 20036 


Please send me: 4 
copy(s) of Graduate Sendi p. 
Psychology 1978-1979 ead 


Name 


600 pages. $6, 


Address. 


City State Zip Code sola 
Please submit full payment or institut 
Purchase order. 


pP 


Thesaurus 


Thesaurus eat 


Index Terms 


of 1977 Edition 
d " 
Psyc hological The Thesaurus of Psychological 


Index Terms is a compilation of 


I ndex the vocabulary used in psycho- 
logy and related disciplines as 


Terms generated from the files of 


Psychological Abstracts. 
The 2nd edition has been revised to reflect the growth and new 
directions in the literature of psychology, and has been designed 
to serve as a more useful resource tool for psychologists, educators, 
researchers, librarians, and information specialists. 


Price: $10. All orders must be prepaid. 


write: 


ore information about the Thesaurus, 
Thesaurus 


American Psychologica! 
1200 17th Street, N.W., Washington, 


For m 


gical Association 
D.C. 20036 


Will the job market 
be able to absorb 
this many psychologists? 


Career Ortunities attempts to broaden the horizons for Locke, Harvey Musikoff, Harvey 

for P. Opp loai sti Prospective psychologists and to con- Schlossberg, and Robert Wolk — 
or Psycho! ogists tribute to changes in Professional prep- Psychology and Criminal Justice . . . Stan. 
Expanding and Emerging Areas aration. The book also provides op- ley L. Brodsky 


5 tions not ordinarily considered by psy- i 
Edited by Paul J. Woods chologists wishing to change careers or Engineering Human Factors, 


those who currently may be unem- Industri Management 
May 1976, 336pp. $5. | ili Alternative Career Directions for the 
Ployed or underutilized. In a small way Industrial-O izational Psychologist 
ISBN 0-912704-03-9 at least, it should help ameliorate what n ES er anria pron olouis 
d re called "the new depression New Directions in Engineering Psychology 
If present trends continue, the number d'or the highly educated. and Human Factors . . . Richard A. Kulp 
of psychologists in the United States Future Directions in Engineering Psychol- 
will double in the next 10 years, Ameri- C (gy: Business Information Systems 
can graduate schools are currently pro- ontents P. er F as k id Techno! 
ducing about 3,000 doctorates and Sychology for Engineering and Technol: 
5,000 master’s in Psychology each coca 79v... H, Mellvaine Parsons 
À ” D 
year, Will the job market be able to An "Optimist" Looks at Employment Op- i 
absorb this many psychologists? ortunites for Psychologists... John A. Pet en tires Een 
lem oncern for nuironment: Implications 
One of the traditional employment op- ae donc: i: Supply End Denand Mop Lau D M. Bess and Ruth 
Portunities for Psychologists has been sex Differences in the Training, Recruit. Bass 
college and university teaching, How- ment, and Employment of Psychologists Alternate Job Settings in Environment and s 
ever, faculties are increasingly becom. — ' - - Michele Andrisin Wittig and Sharon _ Behavior ... Willo P. White y 
ing tenured, and Psychology staff are L. Nolfi Training and Research Opportunities in 
by and large young; therefore, there is _ Innovative Career Opportunities and Job Population Psychology .. Vaida D. 
little likelihood of Substantial turnover. Placement Mechanisms in Psychology Thompson and Sidney H. Newman 
Although psychologists ma be under- | ... Kevin Hynes 
represented in tors of student/acult The Vita . .. Edward K. Crossman and J. Miscellaneous 
ratios (as compared with other disci- Russell Nazzaro The Psychologist as Program Evaluator... * 
Plines), the probability of lob expansio ee Sechrest 
is low because of dM Bienen Academia Public Affairs Psychology . .. Arthur H 
In sum, the employment outlook in Unsolicited Letters (A do-it-yourself pre. __ Brayfield and Mark W. Lipsey 
academia is not rosy. 


scription for coping with the job shortage The Employment Environment for Social 
Employment op- when you are a member of the PRI Psychologists .. . Sheldon G, Levy 


surplus) .. . Robert E. Grinder Psychologists in Architecture .. . Kristina 
provide the best prospects for POM The Academic Labor Market" Helen S Hooper 
psychologists. ‘Astin 


Potpourri: Job Descriptions of Psycholo- 
A gists in Nontraditional and Innovative 
b last a pie conducted by APA Human Services Roles 
in June reported that one out Of Innovative Rol Psychologists 
every five recent PhDs was unem- (ERI W. And ichs] j 
N H Contributions of Psychology to Health Re- 
ever, in the traditional academic areas search: Patterns, Problems, and Poten- 
such as experimental, Physiological, ^ tíals... APA Task Force on Health Re- 
i 5 Bc ose to one (re i for Psychologists in Drug and 
out of every three Ds x ‘Pportunities for Psyc] iologists in Drug a; 
Ployed at the time of graduation ^ Alcohol Pee Faye J, Goldberg Order Department | 
graduation, Psychologists: In and Out of the Alcoholism N y 4 American Psychological 
i Field . ... John C. Wolfe sociation 
Mss P erative that graduate Careers in Forensic Psychology . . . C. 1200 17th Street, N.W. 
pad ee © aware of new and Abraham Fenster, Gary Faltico, Jacob Washington, D.C. 20036 
Panding job possibilities, This book Goldstein, Florence Kaslow, Bernard (202) 833-7600 


Ordering information: Please send full 
Payment to: 


Psychology and 
Instruction 


A Practical Approach to 
Educational Psychology 

Benjamin B. Lahey, University of Georgia 
Martha S. Johnson, University of Detroit 
Psychology and Instruction meets the 
needs of prospective teachers by pro- 
viding information that is directly useful in 
learning how to teach and manage a class 
effectively. 

Practical—supplies the information and 
models teachers need to structure effec- 
tive learning environments. 
Empirical—offers sound, proven principles 
and techniques that work. 
Eclectic—covers a wide range of methods 
and principles for better classroom 
management. 


Each chapter of Psychology and Instruc- 
tion includes * practical examples * high 
interest material exploring classroom situ- 
ations « review questions * summaries * 
important concepts in italics « terms in 
boldface (defined in the glossary) * 
photographs * a second color * free- 
flowing writing style in manageable length 
for high readability. 

January 1978, 448 pages, illustrated, paper- 


back $11.95, with complete Instructor's 
Manual including test items.. 


Adopt a Winner 


Foundations of 
Leaming and Memory 


Roger M. Tarpy, Bucknell University 
Richard E. Mayer, University of California, 
Santa Barbara 

The entire spectrum of learning—animal 
and human—is covered in this compre- 
hensive, readable introductory text. His- 
torical development, major issues, impor- 
tant trends, and exciting advances are 
highlighted with applications to everyday 
life. January 1978, 480 pages, illustrated, 
hardbound $12.50. 


Basic Statistical Concepts 

A Self-Instructional Text 

Second Edition 

Jack |. Bradley / James N. McClelland 
California State University, Long Beach 
Expanded to include correlation, chi- 
square, and analysis of variance. January 
1978, 210 pages, paperback $5.95. For a 
complete introductory package, use with 
Computational Handbook of Statistics, 
Second Edition (Bruning/Kintz, 1977, 308 
pages, paperback $8.95). 


The Systematic Design of 
Instruction 

Walter Dick / Lou Carey 

Florida State University 

Readers learn an integrated, practical 
systems approach to goal identification, 
instructional analysis, construction of per- 
formance objectives, instructional meth- 
ods, product development, and evaluation. 
January 1978, 224 pages, paperback $6.95. 


For further information write 
Jennifer Toms, Department SA 
1900 East Lake Avenue 
Glenview, Illinois 60025 


Scott, Foresman College Divsion 


^ 
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ES 


ychobiological Aspects of 


: By REUVEN KOHEN-RAZ 


The Mastery Approach to Competency-Based 


This book describes theories and em- 
pirical findings on the. relationship between 
neurophysiological and cognitive processes 
in infancy, at school entrance age, and in 
adolescence. It concentrates on phenom- 
ena of cognitive development that are open 
to both behavioral observation and neuro- 
physiological, genetic, or biological inves- 
tigation. The book: counterbalances the 
modern overemphasis on the process of 
socialization as the central factor shaping 
the child's personality and intellect; widens 
the common ground for interdisciplinary 
work in developmental psychology, educa- 
tion, and pediatrics by presenting, for per- 
haps the first time, findings from various 
fields within a common context; advances 
the concepts of stage and age specificity 


À 
Education 

By KAY POMERANCE TORSHEN 

A Volume ín the EDUCATIONAL PSYCHOLOGY Series 


CONTENTS: |. INTRODUCTION TO COM- 
PETENCY-BASED EDUCATION AND MAS- 


* TERY: Introduction." Average-Based Edu- 


cation. Competency-Based Education. 
Domains of Competence. ll. MASTERY: 
THEORY AND RESEARCH: The Mastery 
Model. Mastery Model Implementation: Cog- 
nitive Consequences: Mastery Model Im- 


_The-Bilingual Child 


ha ji RESEARCH AND ANALYSIS OF EXISTING EDUCATIONAL THEMES 


- Edited by ANTONIO SIMÕES, JR. 
^ A Volume in the EDUCATIONAL PSYCHOLOGY Series 


$, 


The Bilingual Child presents an intensive 
analysis of current problems and issues in 


bilingual-bicultural education, in both the, 


United States and Canada. Topics covered 
include; developing cultural attitude scales, 
research in cognitive mapping, a bilingual 
interaction analysis model, social and psy- 


Send payment with order and save postage plus 50¢ handling charge. 
Orders under $15.00 must be accompanied by payment. 
Prices are subject tò change without notice. 
U.S. customers please note; On prepaid orders—payment will be refunded for titles $ 
on which shipment is not possible within 120 days. ef 


ACADEMIC PRESS, INC. 


A Subsidiary of Harcourt Brace Jovanovich, Publishers 


111 FIFTH AVENUE, NEW YORK, N.Y. 10003 
24-28 OVAL ROAD, LONDON NW1 7DX 


"copies, Kohen-Raz: Psychobiological Aspects of Cognitive Growth 
——Copies, Torshen: The Mastery Approach to Competency-Based Education 
——copies, Simões, Jr.: The Bilingual Child 


Bill me— 


Check enclosed... 
NAME 


mE S TJ 


Cognitive Growth 


in physiological effects on cognitive 
opment as a key to progress in under 
ing these effects. 

CONTENTS: Basic Approaches, Psyc! 
ological Aspects of Piaget's Theory 
soritonic Theory. Biosemiotic Interpret 
of Perceptual-Motor Process 
Involvement in Higher Cogniti 
Psychobiological Aspects of Cognit 
velopment in Infancy, Psychobiolog 

pects of School Readiness, Psychot 

cal Effects of Birth Season. Physi 
Maturation and Mental Growth during & 
adolescence and Puberty. Concluding 4d 
marks. 3 
1977, 144 pp., $11.00/£7.80 i 
ISBN: 0-12-418050-7 


plementation: Affective Consequences Re- 
search Relevant to Specific Master Model 
Components. Ill. PRACTICAL CONCERNS: 
Implementation Examples and Evaluation, 
Problems and Potential, 
1977, 248 pp., $13.00/£9.25 
ISBN; 0-12-696050-X 


chological implications of bilingual literacy. 
immersion programs, and the socioeco-§ 
nomic implications of bilingual education; 
on a Navajo reservation 

1976, 292 pp., $18.00/£12.80. 
ISBN: 0-12-644050-6 
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Generative Processes in Reading Comprehension 


Marleen Doctorow 
California State University, Long Beach 


M. C. Wittrock and Carolyn Marks 
University of California, Los Angeles 


In the generative model of learning with understanding, reading comprehen- 
sion occurs when readers actively construct meaning for text. In two experi- 
ments with a total of 488 sixth graders, where time to learn was held constant 
across all treatments, it was predicted and found that the facilitation of gener- 


ative processes by the insertion of paragraph headings and instructions to gen- 
erate sentences about story paragraphs during encoding produced the greatest 
comprehension, followed in turn by instructions to generate sentences, the in- 
sertion of paragraph headings, and then by reading the same stories without 


generative instructions or paragraph headings. 


The combination of inserted 


paragraph headings and instructions to generate sentences about paragraphs 
approximately doubled comprehension and recall in each experiment. 


In the generative model of learning with 
understanding (Wittrock, 1974), reading 
comprehension is facilitated when, during 
encoding, learners use their memories of 
events and experiences to construct mean- 
ings for the text. The words, sentences, and 
paragraphs in a given context are the re- 
trieval cues that stimulate semantic pro- 
cessing of information stored in memory. 
From the semantic processing of abstract 
and concrete memories, readers generate 
meanings for the text. The actively con- 
structed individualized meanings represent 
each learner’s comprehension of the text. 

The generative model predicts that the 
comprehension of text is facilitated when 
one stimulates learners to construct mean- 
ingful elaborations for the text and when one 
provides semantic retrieval cues to enhance 
the recall of information relevant to the 
construction of these elaborations. These 
predictions can be tested in several ways. 

One way to test them is to insert high- 
frequency words into the text. According to 
the model, these high-frequency words serve 
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as semantic retrieval cues that facilitate re- 
call of information and experience relevant 
to the construction of meaning for the 
text. 

In three experiments with sixth-grade 
students, Marks, Doctorow, and Wittrock 
(1974) found that reading comprehension 
was statistically significantly increased (p < 
01) when one or two higher frequency words 
per sentence Were substituted for synony- 
mous words of lower frequency. 

Another way to test the predictions is to 
use a familiar story as à set of retrieval cues 
that can help learners to recall information 
useful for the construction of meanings of 
unfamiliar and undefined vocabulary words. 
In two experiments with sixth-grade stu- 
dents, Wittrock, Marks, and Doctorow 
(1975) found that familiar stories facilitated 
the learning of definitions of unfamiliar, 
undefined vocabulary words and the com- 
prehension of the text. Apparently, the fa- 
miliar story served as the context that cued 
the retrieval of memories relevant to the 
construction of meanings for the embedded 
vocabulary words. 

In addition to providing semantic retrieval 
cues, one can test predictions of the genera- 
tive model by use of instructions to generate 
associations among words (Wittrock & 
Carter, 1975) or to generate pictures to rep- 
resent the meanings of the words (Bull & 
Wittrock, 1973). Both of these procedures 
facilitated learning and recall. For example, 
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in the Wittrock and Carter study, instruc- greater extent than are inserted paragraph 
tions to generate associations among hier- headings because, unlike the generative in- 
archically arranged words sizably increased Structions, the headings are neither neces. 
free recall. In the same study, appropriate sarily sufficient for generating elaborations 
inserted hierarchical retrieval cues also fa- about the text nor necessarily adequate for 
cilitated free recall. In the following two retrieval of related information. Although 
studies, we use Variations of these two vari- this prediction is made for both the high- 
ables, generative instructions and inserted ability readers and the low-ability readers, 
retrieval cues, to test further implications of it is more likely to occur with the low-a bility 
the model and to improve reading compre- readers. They may not spontaneously 
hension and recall. elaborate sentences as readily as do the 
In these Studies, to facilitate the con- high-ability readers, 
struction of meaningful elaborations of the Finally, it is predicted that treatments 
text, the readers were instructed to write with both Paragraph headings and genera- 
sentences from memory about each para- tive instructions (GR; and GR») produce the 


that they wrote during encoding. To faci]. mation and the construction of meaningful 
itate retrieval of relevant information, elaborative sentences, 

paragraph headings for each paragraph of. Therefore, the following hypotheses were 
the stories were inserted into the text. The tested: 


ingful elaborations and if the inserted re- additional Prediction that Ci > C,. 

trieval cues increase the number of para- Hypothesis 2. The insertion of Paragraph 
graph headings used in these meaningful headings that serve as retrieval cues in a 
elaborations, then it is possible to determine Story (R; and R3) increases comprehension 
the effects of the treatments on compre- and recall. 


Stories without the headings (C), read only R;) 

the Paragraph headings (Cy), or read an un- Hypothesis 4. Treatments with both 
related Story (Ca). Provided that the gen- paragraph headings and generative in- 
erative Instructions facilitate the construc- structions (GR, and GR2) produce the 
tion of meaningfu] elaborative Sentences, it greatest comprehension and recall of all the 
is Predicted that the treatments with the treatments in the experiments, 


Prehension and recall compared with contro] Method 
nts. 


Tuctions to generate original elabora. Experimental Design 


2 In each of two experiments, a simple randomized 
enhance comprehension and recall to a  between-subjects design was used; with individuals 
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randomly assigned within sex and reading level to the 
eight treatments. To test the effects of generative 
processes on story comprehension, it was necessary to 
use stories and tests appropriate in difficulty for each 
of two levels of reading ability. As a result, the stories, 
comprehension tests, and recall tests differed across the 
two ability levels, making it necessary to use two ex- 
periments instead of one. 


Subjects 


Subjects were 488 10- to 12-year-old children, the 
entire sixth-grade population of three West Los Angeles 
elementary schools. Using Science Research Associates 
(SRA) Reading Placement Test scores (Parker & 
Scannell, 1969), students were divided into high- (n= 
248) and low- (n = 240) ability groups. Scores of 1-7 
defined the low-ability group; scores of 8-14 defined the 
high-ability group. The placement test consisted of a 
practice story and two test stories accompanied by 14 
multiple-choice comprehension questions. 


Materials 


Low-frequency versions of two commercially pub- 
lished stories commonly used in school to teach reading, 
“The Mirror” and “Conductor Moses,” were used as the 
treatment materials. “The Mirror,” Story Number 5 
in the Tan Series, consists of 372 words and is at the 
lowest level of the SRA Power Builder Reading Labo- 
ratory Kit IIB (Parker & Scannell, 1969). “Conductor 
Moses,” Story Number 15 in the Silver Series, consists 
of 1,125 words and is at the highest level of the same 
SRA kit. “The Mirror” was used for students who 
obtained a score of 1-7 on the SRA Reading Placement 
Test, and “Conductor Moses” was used for students 
who obtained a score of 8-14 on the SRA Reading 
Placement Test. (Permission to use these stories was 
granted by the publishers.) 

To increase the difficulty of the materials and yet 
keep them appropriate for sixth-grade children, low- 
frequency words at their grade level were inserted into 
the two stories. Each story was divided into consecu- 
tive blocks of six words. Within each block, the fre- 
quency value of each noun, adjective, verb, and adverb 
was determined from the Carroll, Davies, and Richman 
(1971) tables, which report frequency values and grade 
levels for 86,741 words. Based on the table’s frequency 
values, each noun, adjective, verb, and adverb in the 
original SRA stories was classified as a low-frequency 
word if it occurred less than 50 but at least one time per 
50,000 words or was classified as a high-frequency word 
if it occurred more than 50 times per 50,000 words. 
Within each block of six words, the high-frequency word 
tnost amenable to substitution was replaced by a syn- 
onymous low-frequency word of the same part of 
speech, of comparable length, and of appropriate grade 
level as specified by Carroll et al. (1971). The result of 
this procedure was a low-frequency version of each of 
the two stories, which differed from the original version 
in the frequency value of approximately 15% of the 
words of the passage. 

Readability of the treatment passages was held con- 


111 


stant by keeping (a) syntactical and grammatical ele- 
ments of the old and the new version identical, (b) 
substituted word lengths for the two versions analogous, 
and (c) substituted words for the two versions restricted 
to those words that had occurred in children’s texts at 
the sixth-grade level. 


Treatments 


In each experiment, there were five experimental 
treatments and three control treatments. 

Ry. Inthe one-word paragraph-heading treatment, 
a one-word retrieval cue (R1), a noun, was given above 
each paragraph of the story. The noun that occurred 
most often in each paragraph was determined. If that 
noun was the name of a story character, it was used as 
the paragraph heading. For all other nouns, a synon- 
ymous noun of higher frequency value than the noun in 
the story was used as each paragraph heading. In this 
treatment, the students were instructed to read the 
story and the paragraph heading. In the example 
presented in the Rz section that follows, the one-word 
heading "letter" was used. 

Rs. In the two-word paragraph heading treatment 
(R4), a two-word retrieval cue was given above each 
paragraph of the story. The first word was the same 
word that appeared in the R; treatment. The second 
word was selected as follows: First, the direct object of 
the first word and the second most often occurring 
subject in the paragraph were determined. Second, a 
synonymous word of a higher frequency than the direct 
object or subject, whichever occurred more often in the 
paragraph, was used as the second word of the heading 
of each paragraph of the story. The two-word heading 
generally abstracted a central part of the theme of each 
story paragraph. In this treatment, as in Rj, the stu- 
dents were instructed to read the story and the para- 
graph headings. 

The following story paragraph and its two-word 
headings were used in the experiment with the high- 
ability readers: 


Letter: Escape 
To be assured her brothers would be prepared, she 


had prepared a message in advance. Since specific 
officials examined all of the slaves’ mail, Harriet’s 
message was addressed to a man named Jacob 
Johnson, who secretly assisted the Underground 
Railroad, and who was one of the relatively free 
black men in Maryland. However, even Jacob’s 
mail might be searched, so Harriet had to be cau- 
tious. Her message stated: “Inform my brothers 
to be always devoted to prayer, and when the sturdy 
aged fleet of vigor glides along to be prepared to 
unite aboard.” (Adapted from Parker & Scannell, 
1969, Silver Series, Story Number 15, paragraphs 4 
and 5.) 


G. In the generation treatment (G), a blank space 
was provided above each paragraph of the story. No 
paragraph headings were given in this treatment. The 
children were instructed to generate and to write their 
own sentence about what happened in the paragraph 
after they read each paragraph of the story. 
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GR;. In the treatment combining the one-word 
paragraph heading with the generative instructions 
(GR;), the single word used in the R; treatment anda 
blank space were given above each paragraph. The 
instructions were the same as those used in the G 
treatment, except that the students were asked to write 
each sentence using the one-word Paragraph heading. 

GR». In the treatment combining the two-word 
paragraph heading with generative instructions (GR»), 
the two words used in the R» treatment and a blank 
Space were given above each paragraph of the story. 
The instructions were the same as those used in the GR; 
treatment, except that the students were asked to write 
each sentence using the two words provided above each 
paragraph. 

C, In the control story treatment (C,), neither 
paragraph headings nor generation instructions were 
given. The students were instructed to read the same 
story that the children in the experimental groups 
read. 

Cha. Inthe control heading treatment ( Cy), only the 
two words that were given above each story paragraph 
in treatments Ry and GR» were presented. The story 
was not presented. The students were instructed to 
read the paragraph headings and to think of a Paragraph 
that might be written using each set of two words for the 
paragraph heading. 

Cu. In the control unrelated-story treatment (C; 
neither the organizers nor the experimental Story were 
given. Instead, an unrelated low-frequency version of 
an SRA story at the appropriate grade level was used. 
The students were instructed to read this story. 


Comprehension Test 


ment. There were 20 items in “The Mirror” compre- 
hension test and 28 items in the “Conductor Moses” 
comprehension test. The odd-even reliability coeffi- 
the Spearman-Brown 
formula (Cronbach, 1960, p. 161), for these compre- 
hension tests were .93 for “The Mirror” and .96 for 
To determine the effects of se- 


Noncued Inferential Meaning subtest (NIM). The 
correct answers for these items required the use of more 


Paragraph, were used in the experiment with the high- 
ability readers: 


Slaves; Danger 

The Underground Railroad ranged the entire way 

to Canada now. In the Senate, recent orders had 
n issued making it a grave offense for anybody in 
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cure. (Adapted from Parker & Scannel, 1969, Sil- 
ver Series, Story Number 15, paragraph 1.) 


The Underground Railroad: A. was used as an es- 
cape route B. was extended half-way to Canada C. 
was the fastest way to travel D. was ordered by the 
Congress. 


Cued Inferential Meaning subtest (CIM) ). The 
correct answers for these items required the use of more 
than one sentence of a Paragraph to infer meaning that 
Was synonymous with or identical to the meanings of the 
paragraph headings, The following story Paragraph 
and two-word heading, along with a sample multiple- 
choice CIM test item pertaining to this story paragraph, 
were used in the experiment with the high-ability 
readers; 


Letter: Escape 

To be assured her brothers would be prepared, she _ 
had prepared a message in advance. Since specific 
officials examined all of the slaves’ mail, Harriet’s 
message was addressed to a man named Jacob 
Johnson, who secretly assisted the Underground 
Railroad, and who was one of the relatively few free 
black men in Maryland, However, even Jacob’s 
mail might be searched, so Harriet had to be cau- 
tious. Her message stated: “Inform my brothers 
to be always devoted to prayer, and when the sturdy 
aged fleet of vigor glides along to be prepared to 
unite aboard.” (Adapted from Parker & Scannell, 
1969, Silver Series, Story Number 15, Paragraphs 4 
and 5.) 


Harriet’s code which told her brothers to “be pre- 
pared to unite aboard," meant: A. to beware of 
Specific officials B. to get ready to escape C. to visit. 
her parents D. to contact Jacob. 


Noncued Sentence Meaning subtest (NSM). The 
correct answers for these items required the use of one 
sentence of a paragraph to comprehend meaning that 
was not synonymous with or identical to the meanings 
of the paragraph headings. The following story para- 
graph and two-word heading, along with a sample 
multiple-choice NSM test item pertaining to this story 
Paragraph, were used in the experiment with the low- 
ability readers: 


Wish: Mirror 
An unusual longing has come over me. 
the mirror attracts me. 
nearer. Ever nearer, 


Each day 
It draws me nearer and 
I know what I must do. 
First I will write this. Then I will enter the mirror. 
No matter what occurs there, I must proceed. May 
the gods take heed of me. (Adapted from Parker 
Scannell, 1969, Tan Series, Story Number 5, para- f 
graph 8.) 4 
What did the man want the gods todo? A. give him 


Pity B. heed him C. be afraid of him D. heal him. 


Cued Sentence Meaning subtest (CSM). The cor- 


the Northern region to aid an escaped slave, so fugi- 
tives were hunted where nobody had bothered them 
earlier, Only in Canada could they be totally se- 


rect answers for these items required the use of one 
sentence of a paragraph to comprehend meaning that 
was synonymous with or identical to the meanings of the 
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paragraph headings. The following story paragraph 
and two-word heading, along with a sample CSM test 
item pertaining to this story paragraph, were used in the 
experiment with the low-ability readers: 


Moving: Eyes 

The room was totally still. But.. . the mirror... I 
saw that the blossoms and leaves of the mirror were 
stirring! And amongst the leaves, the unusual, tiny 
dwarfs were laughing at me! I could not budge for 
fright. It couldn’t be! I had to regain my senses. 
My eyes must be weary. I shut them. But when I 
blinked them again, the vines were still stirring. 
(Adapted from Parker & Scannell, 1969, Tan Series, 
Story Number 5, paragraphs 4 and 5.) 


When the man opened his eyes: A. he could not 
move B. he could not see C. part of the mirror was 
still stirring D. the mirror had disappeared. 


Cloze Recall Test 


A Cloze Recall Test, modified as follows, was ad- 
ministered to subjects at each reading level 1 week after 
the experimental treatments. To keep the test at a 
moderate level of reading difficulty, blanks were in- 
serted for every other word that had been modified in 
the construction of the stories. The omitted low-fre- 
quency words or their synonyms were accepted as cor- 
rect answers to the modified cloze test. 


Procedure 


The SRA Reading Placement Test was given to all 
students. Based on the placement test score, each 


was held constant. 
of Experiment 1, in which the 1,125-word story was 
used, was 20 minutes. The corresponding time for 


each subject took a multiple-choice comprehension test. 
Fifteen minutes were allotted to students, which was 
ample time for all of them to complete this test. On 
Day 3, 1 week after Day 2, each student took the Cloze 
Recall Test. Thirty minutes were allotted to all stu- 
dents to complete this test. 


Results 


In each experiment, the encoding data, the 
data from the comprehension test and its 
four subtests, and the data from the Cloze 
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Recall Test were analyzed. Two chi-square 
analyses were performed on the encoding 
data. One of these chi-square analyses was 
performed on the number of sentences gen- 
erated in the G, GR, and GR; groups. 
generated sentence was defined as any 
written sentence that did not duplicate a 
sentence of the text. The other chi-square 
was performed on the number of paragraph 
headings used in the generated sentences. A 
Treatment X Sex multivariate analysis of 
variance was computed on the data of the 
four comprehension subtests. A Treatment 
X Sex univariate analysis of variance was 
then computed on the data of each of the 
four comprehension subtests, the compre- 
hension test as a whole, and the recall test. 
Finally, planned comparison tests were used 
in each experiment to test the four hypoth- 
eses about comprehension and recall. 


Encoding 


As intended, in each experiment the chil- 
dren in the generation treatments, G, GRi, 
and GR», constructed and wrote original 
sentences for most of the paragraphs of each 
story. The percentages of possible sentences 
generated and written in the G, GR, and 
GR» groups were (a) for the higher ability 
readers, 87, 90, and 90, respectively, and (b) 
for the lower ability readers, 73, 77, and 81, 
respectively. In both experiments, the chi- 
square analyses of the number of sentences 
generated indicated that, as intended, there 
were no statistically significant treatment 
differences (p > .05) among the G, GR, and 
GR; groups. 

Also as intended, there were statistically 
significant treatment mean differences in 
each experiment in the number of words 
from the two-word paragraph headings that 
were used by the children in their generated 
sentences. The percentages of possible 
words used in the G, GRi, and GR» groups 
were (a) for the high-ability readers, 38, 53, 
and 78, respectively, and (b) for the low- 
ability readers, 24, 43, and 58, respectively. 
These mean differences were statistically 
significant for the high readers, x? = 225.6 (p 
< 01) and for the low readers, x? = 104.3 (p 
< 01). 

These results indicate that the treatments 
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produced their intended effects on encoding. 
Original sentences were written for most of 
the paragraphs in every treatment where the 
generative instructions appeared. The 
number of paragraph headings used in the 
generated sentences increased significantly 
from the G to the GR; to the GR; treat- 
ments. However, the low-ability children 
used fewer paragraph headings than did the 
high-ability children in the GR; and GR; 
groups. 


Comprehension and Recall 


A Treatment X Sex multivariate analysis 
of variance, computed on the data of the four 
comprehension subtests, was statistically 
significant in the experiment with the high 
readers, F(7, 224) = 11.30, p < .01, and in the 
experiment with the low readers, F(7, 232) 
714.17, p <.01. In the univariate analyses 
of variance, which were computed for each 
of the four comprehension subtests of each 
experiment and for the comprehension test 
as a whole for each experiment, the treat- 
ment effects were each statistically signifi- 
cant(p «.01). Forthe experiments with the 
high and low readers, respectively, the re- 
sults were (a) for the NIM subtest, F(7, 224) 

= 45.16 and F(7, 232) = 55.30; (b) for the 
CIM subtest, F(7, 224) = 50.00 and F(7, 232) 
= 46.90; (c) for the NSM subtest, F(7, 224) 
= 46.04 and F(7, 232) = 45.87; (d) for the 
CSM subtest, F(7, 224) = 33.22 and F(7, 232) 
= 26.22; and (e) for the comprehension test 


interactional effect between treatment and 
sex (p > .05) on any comprehension test or 
subtest, 

The univariate analyses of variance for the 
recall test data also indicated a Statistically 
significant treatment effect for the experi- 
ment with the high readers, F(7, 224) = 
13.91, p « -01, and for the experiment with 
the low readers, F(7, 232) = 17.86, p « 01. 
In. each experiment, there was neither a 
primary sex effect (p > .05) nor an interac- 
NES effect between treatment and sex (p 

Insum, the multivariate analyses of vari- 
ance and the univariate analyses of variance 


M. DOCTOROW, M. WITTROCK, AND C. MARKS 


consistently indicated a Statistically signif- 
icant treatment effect on the comprehension 
test as a whole (p < .01), on each of the four 
comprehension subtests (p < .01), and on 
the recall test (p < .01) for each experi- 
ment. 

Planned comparisons were used as follows 
to test the four hypotheses: 

Hypothesis 1. Hypothesis 1 predicted 
the following linear trend among treatment 
means on the comprehension test. and sub- 
tests: GR; > GR1 > G > R5 R; > C, > Ch, 
with Ch > C, on the recall test. The planned 
comparisons testing this predicted linear 
order among the treatment means were each 
statistically significant on the data of the 
comprehension test as a whole forthe high 
and low readers, respectively: F(1, 203) = 
143.10, p < .01 and F(1, 210) = 213.13, p < 
-01. It was also statistically significant for 
each of the four comprehension subtests for 
the high and low readers, respectively: NIM 
subtest, F(1, 203) — 96.65, p <.01and F(1, 
210) = 123.07, P < .01; CIM subtest, F(1, 
203) = 153.93, p < .01 and F(1, 210) = 
116.61, p. < .01; NSM subtest, F(1, 203) = 
94.34, p < .01 and F(1,210) = 103.22, p « 
01; and CSM subtest, F(1, 203) = 65.23, p < 
-01 and F(1, 210) — 72.89, p <0.1. And it 
was statistically significant for the recall test 
for the high and low readers, respectively: 
F(1, 232) = 95.08, p « .01 and F(1, 240) = 
115.62, p <.01. The treatment means and 
Standard deviations are indicated in Table 
l. These results support Hypothesis 1 
without exception. 

Hypothesis 2. Hypothesis 2 predicted 
that the reading of a Story with paragraph 
headings (R; and Rə) increases story com- 
prehension and recall in comparison to the 
control treatments in which the same story 
is read without the paragraph headings 

s). 

The results supported the hypothesized 
differences. The results of the planned 
comparisons tests, ¥(R, + R3) versus C,, on 
the data of the comprehension test as a 
whole were statistically significant for the 
high and low readers, respectively: F( 1, 203) 
= 25.89, p < .01 and F(1, 210) = 29.23, p < 
-01. The results of the same planned com- 
parisons tests were also statistically signifi- 
cant on each of the four comprehension 
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Deviations for Comprehension and Recall Test Scores 


Treatment Means and Standard Í p 
High-ability readers Low-ability readers 


Comprehension 


Retention Comprehension Retention 


(28 items) (131 blanks) __(20 items) (38 blanks) 

Treue M SD M SD M SD M SD 

Two-word heading, generative instructions 
18.67 4.09 67.66 21.84 13.16 1.88 1829 5.22 

One-word heading, generative instructions 
(GR) 16.80 4.14 60.83 20.19 1210 291 17.93 5.39 
Generative instructions (G) 1543 4.86 51.26 23.59 1203 327 1532 6.27 
Two-word heading (R2) 15.50 5.11 5017 23.97 1065 291 14.804 5.59 
One-word heading (Ri) 15.97 4.41 45.87 21.55 10.00 3.66 13.84 5.95 
Control, same story (Cs) 1077 451 35.06 19.89 719 285 913 3.92 
Control, headings only (Cy) 710 243 3176 11.69 490  L47 8.99 3.58 


ntrol, unrelated story (Cu) 


30.50 10.94 


Co 


subtests and on the recall test for the high 
and low readers, respectively: INIM subtest, 
F(1, 203) = 15.57, p < .01 and F(1, 210) = 
14.88, p < .01; CIM subtest, F(1, 203) = 
21.25, p < .01 and F(1, 210) = 11.59, p € .01; 
NSM subtest, F(1, 203) = 17.93, p < .01 and 
F(1, 210) = 16.59, p < .01; CSM subtest F(1, 
203) = 11.46, p < .01 and F(1, 210) = 11.30, 
p < 01; and recall test, F(1, 232) = 851,p < 
.01 and F(1, 240) = 21.16, p < .01. 

Hypothesis 3. This hypothesis predicted 
that the generation of a sentence for each 
paragraph of a story (G) produces greater 
story comprehension and recall than does 
the reading of the same story with paragraph 
headings (R; and Re) or without the para- 
graph headings (C,). 

The results of the planned comparisons, 
G versus C,, on the data of the comprehen- 
sion test as a whole were statistically signif- 
icant for the high and low readers, respec- 
tively: F(1, 203) = 79.36, p < .01 and F(1, 
210) = 208.38, p < .01. The results of the 
planned comparisons between G and C, were 
also statistically significant on each of the 
four comprehension subtests and recall test 
for the high and low readers, respectively: 
NIM subtest, F(1, 203) = 55.35, p < .01 and 
F(1, 210) = 101.38, p < .01; CIM subtest, 
F(1, 203) = 56.25, p < .01 and F(1, 210) = 
106.93, p < .01; NSM subtest, F(1, 203) = 
55.58, p < .01 and F(1, 210) = 114.64, p < 
.01; CSM subtest, F(1, 203) = 27.30, p < 01 
and F(1, 210) = 73.17, p € .01; and recall 
test, F(1,932) 238.44, p < .01 and F(1,240) 
= 196.98, p < .01. 

There was a statistically significant dif- 


ference favoring the generation treatment 
(G) over the paragraph heading treatments 
(Rı and Re) on the comprehension test as à 
whole for low readers, F(1, 210) = 8.57,p € 
.01, but not for the high readers (p > .05). 
The above treatment difference was also 
significant for two of the comprehension 
subtests for the low readers, that is, for the 
CIM subtest, F(1, 210) = 6.58, p € .05, and 
for the NSM subtest, F(1, 210) = 4.36, p < 
.05, and was not significant for any of the 
comprehension subtests for the high readers 
(p > .05). There was no statistically sig- 
nificant difference between the G group and 
the R; and Rz groups on the recall test for 
either high or low readers (p > .05). 

Hypothesis 4. Hypothesis 4 predicted 
that the generation-retrieval treatments 
(GR; and GR») produce greater story com- 
prehension and recall than do the retrieval 
cue treatments (Ry and Ro) or the generation 
treatments (G). 

The first set of planned comparisons de- 
signed to test this hypothesis determined 
whether GR; + GR? was greater than R; + 
Re. Each of these 12 planned comparisons 
supported this hypothesis. In the experi- 
ments with the high and low readers, re- 
spectively, the generation-retrieval treat- 
ment means (GR, + GR?) were statistically 
significantly greater than the retrieval 
treatment means (R; + R2) for the compre- 
hension test as a whole: F(1, 203) = 10.05, 
p < .01 and F(1, 210) = 23.53, p < .01; for 
each of the four comprehension subtests: 
NIM subtest, F(1, 203) = 7.97, p < .01 and 
F(1,210) = 17.7, p € .01; CIM subtest, F(1, 
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203) = 5.53, p < .05 and F(1, 210) = 17.87, p 
<.01; NSM subtest, F(1, 203) = 6.51, p < .05 
and F(1, 210) = 9.68, p < .01; CSM subtest, 
F(1, 203) = 4.38, p < .05 and F(1, 210) = 
6.98, p < .01; and for the recall test: F(1, 
232) = 20.03, p < .01 and F(1, 240) = 16.77, 
p <01. 

The second set of planned comparisons 
designed to test Hypothesis 4 determined 
whether (GR, + GR?) was greater than G. 
For the high readers, all of the planned 
comparisons supported the hypothesis. It 
was supported for the comprehension test as 
a whole: F(1, 203) = 9.46, p < .01; for each 
of the four comprehension subtests: NIM 
subtest, F (1, 203) = 5.53, p < .05; CIM sub- 
test, F(1, 203) = 7.26, p < .01; NSM subtest, 
F(1, 203) = 6.08, p < .05; CSM subtest, F (1, 
203) = 6.37, p < .05; and for the recall test: 
F(1, 232) = 12.82, p < .01. For the low 
readers, only the planned comparison on the 
recall test data supported the hypothesis, 
F(1, 240) 7 9.14, p € .01. 

These results of the tests of Hypothesis 4 
indicate that the addition of generative in- 
structions to paragraph headings (GR, + 
GR» vs. Ry + R5) enhanced comprehension 
and recall for both high and low readers. 
However, the addition of paragraph head- 
ings to generative instructions, V(GRs + 
GR;) versus G, enhanced recall for both 
groups but enhanced comprehension only for 
the high-ability group. 

In sum, the results of the planned com- 
parisons tests supported the following pre- 
dicted treatment mean differences: First, 
the hypothesized linear trend among all of 
the treatments, GR: > GR: > G> R2>Rı 
> C, > Cy, occurred as predicted on the 
comprehension test as a whole, on the four 
comprehension subtests, and on the Cloze 
Recall Test, in which there was the addi- 
tional C, treatment. Second, each of the 
retrieval and generation treatment means 
(Ri, Re, and G) was statistically significantly 
greater than each of the control treatment 
means on the comprehension test as a whole, 
on each of the four comprehension subtests, 
and on the recall test. Finally, on each of the 
six dependent variables, the treatment 
means that involved a combination of gen- 
eration and retrieval (GR, and GR») were 
statistically significantly greater than those 
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means that involved paragraph headings 
alone. All of the above results occurred with 
high-ability readers and again with low- 
ability readers. 

In addition, for the high readers only, on 
each of the six dependent variables, the 
means of the treatments that involved a 
combination of generation and paragraph 
headings were statistically significantly 
greater than the means of the generation 
treatments. Forthe low readers, the above 
difference was obtained only on the recall 
test, and the generative processing instruc- 
tions (G) enhanced comprehension com- 
pared with paragraph headings used as re- 
trieval cues (R; or Ro). 


Discussion 


In two experiments, where time to learn 
was held constant across all treatments, one- 
or two-word paragraph headings were in- 
serted in the text to serve as retrieval cues. 
Instructions to construct original sentences 
summarizing events described in each 
paragraph of a text were used to increase the 
generative processing of memories. The 
combination of the two conditions, inserted 
paragraph headings and instructions to 
generate original sentences using the para- 
graph headings, was used to facilitate both 
retrieval and generative processing of in- 
formation. 

To test predictions from the model de- 
scribed above, it was necessary to establish 
first that the treatments affected encoding 
as intended. The encoding data collected in 
the treatments where sentences were written 
by the children (G, GR;, and GR) indicated 
that the number of retrieval cues used to 
construct original elaborative sentences in- 
creases significantly (p < .01) from the G to 
the GR; to the GR» treatments, as intended. 
The instructions to generate original sen- 
tences also produced the intended effects, 
resulting in the writing of original sentences 
for most of the paragraphs in every treat- 
ment where the instructions appeared, with 
the high-ability learners constructing more 
sentences than the low-ability learners. 
chi-square test indicated no statistically 
significant differences across the G, GR, and 
GR? treatments in the number of original 
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sentences written. All of the treatments 
produced their intended effects on encod- 
ing. 

As the above results indicate, the treat- 
ments seem to be influencing encoding. 
However, to determine whether more than 
attention was affected by the treatments, 
four subtests of comprehension were con- 
structed. Two of the subtests measured 
comprehension of information cued by the 
paragraph headings (the CIM and CSM 
subtests), and the other two subtests mea- 
sured comprehension of information not 
cued by the paragraph headings (the NIM 
and NSM subtests). If only attention were 
facilitated, the predicted rank order among 
the treatment means would have occurred 
only on the subtests that measured com- 
prehension of information cued by the 
paragraph headings, the CIM and CSM 
subtests. However, the predicted rank order 
among the treatment means also occurred on 
the subtests that measured comprehension 
of noncued information, the NIM and NSM 
subtests. The facilitation of noncued in- 
formation implies that, in addition to at- 
tention, encoding was influenced by the 
treatments. 

Because the treatments affected encoding 
as intended, it became possible to test the 
predictions of the model, all of which per- 
tained to the effects of encoding on com- 
prehension and recall. Without exception, 
the predicted linear trend among the treat- 
ment means occurred on each of the 12 tests 
of comprehension and on each of the 2 tests 
of recall (GR; > GR; > G> R27 Rı> C> 
Ch > C). With the exceptions noted in the 
Results section, the combination of para- 
graph headings and generative-processing 
instructions produced the greatest compre- 
hension and recall, followed in turn by gen- 
erative instructions alone, paragraph head- 
ings alone, and then by the control treat- 
ments. These data are consistent with a 
model of comprehension in which memories 
are retrieved and used to construct meaning 
for text. 

The treatments with the inserted para- 
graph headings, Ri and R», produced greater 
comprehension and recall than did the 
reading of the same stories without the 
paragraph headings (p < .01) on every re- 


117 


spective test used in the experiments with 
the high-ability readers and with the low- 
ability readers. "These results provide sup- 
port for the hypothesis that cues for the re- 
trieval of relevant information can facilitate 
comprehension and recall of text. 

The facilitation of comprehension as well 
as recall by the paragraph headings may 
mean only that the tests of comprehension 
and recall are measuring the same process. 
Alternatively, the findings may mean that, 
in agreement with the generative model, the 
recall of relevant information and the con- 
struction of meaningful elaborations for text. 
are closely related processes in comprehen- 
sion, both of which are facilitated by the in- 
sertion of retrieval cues. 

In the treatments with instructions to 
generate original sentences to elaborate the 
meanings of the paragraphs, the G treat- 
ments, most readers generated the sen- 
tences, with the high-ability readers gener- 
ating a greater percentage of sentences than 
did the low-ability readers. With both 
groups of readers, the generative instruc- 
tions, compared with the control procedures, 
facilitated comprehension and recall In 
addition, the comprehension of the low- 
ability readers was enhanced more by gen- 
erative instructions than by inserted para- 
graph headings (G vs. Ry and Re). It seems 
that the low-ability readers may not spon- 
taneously construct verbal elaborations for 
text as readily as do the high-ability readers. 
The generative instructions facilitate the 
comprehension of both groups and were 
especially helpful with the low-ability 
readers. 

The combination of generative instruc- 
tions and retrieval cues (GR; and GR) 
consistently produced greater comprehen- 
sion and greater recall for both groups of 
readers than did the inserted retrieval cues 
alone (R; and R2) and greater recall for both 
groups than did the generative instructions 
alone (G). In addition, among the high- 
ability readers, the combination treatments 
(GR; and GR?) produced greater compre- 
hension than did the generative instructions 
alone (G). These results support the gen- 
erative model but indicate that some of the 
treatments work somewhat differently with 
high-ability readers and low-ability readers. 
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The model predicts, as found, that with both 
groups of learners, two ways to enhance the 
process of using memories to generate 
meaning for text include the stimulation of 
retrieval and the stimulation of the con- 
Struction of elaborative sentences. The 
model also predicts, as found, that the ad- 
dition of generative instructions to inserted 
retrieval cues increases comprehension for 
both groups of readers, while the addition of 
retrieval cues to generative instructions in- 
creases recall for both groups of readers and 
comprehension for the high-ability readers. 
However, the model did not predict that the 
treatment effects depend on the reading 
abilities of the students in the following way. 
Although the predicted rank order among 
the treatment means occurred with both 
groups of readers, the paragraph headings 
instructions in- 
creased comprehension more for the high- 
the low-ability 
readers. The generative instructions facil- 


meaningful elaborations can increase read- 
Ing comprehension and recall. These data 
are consistent with the generative model of 
learning with comprehension and with the 
results of a series of Previous studies (Marks 
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et al., 1974; Wittrock & Carter, 1975; Wit- 


trock et al., 1975). 

The data imply that models of learning 
with comprehension should attend to the 
retrieval processes and to the generative 
Processes involved in the construction of 
meaning for text. Although the retrieval of 
memories of experience and the elaboration 
of meaning for text have sometimes been 
viewed as primarily independent cognitive 
processes, they seem to be complementary 
and interdependent parts of the generative 
Processes involved in comprehension. Our 
data indicate that the generative processes 
of comprehension can be facilitated with 
appropriate instructions and retrieval cues, 
whose effects depend on the different abili- 
ties of the learners. 
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Convergent and Discriminant Validity of Five Classroom 


Observation Systems: 


Testing a Model 


Gary D. Borich, David Malitz, and Cherry L. Kugle 


University of Texas at Austin 


This study evaluated the convergent and discriminant validity of selected be- 


haviors measured by five classroom observation systems. 


Five pairs of raters 


were trained in using each observation system, and data were obtained from 
the coding of three 50-minute videotapes of classroom interaction for each of 
12 teachers. Comparisons of the five systems with one another yielded 23 


categories that measured similar behaviors across two or more systems. 


Ap- 


proximately half of the 23 categories passed all tests for both convergent and 
discriminant validity. The implications of these results for future process- 
product studies and for the future use of the convergent and discriminant va- 


lidity model are discussed. 


Numerous instruments have been devel- 
oped to observe classroom behavior sys- 
tematically. "These instruments typically 
consist of a number of categories of 
teacher-student behavior, which an observer 
tallies or rates periodically while observing 
classroom interaction. For the greater part 
of a decade, researchers have used such in- 
struments to investigate the relationship 
between teacher behavior and student out- 
come, but this effort has yielded relatively 
few consistent findings.! While many pos- 
sible reasons for the dirth of consistent 
findings can be advanced, two which must be 
considered are (a) that the research model or 
theory implicit in process-product investi- 
gations may be inadequate or too simplistic 
to uncover such relationships and (b) that 
psychometric weaknesses within instru- 
ments used by the researchers may obscure 
any underlying relationships that do exist. 

At present, there is no a priori reason to 
suspect one of these possibilities over any 
other. However, as process-product studies 
themselves confirm (Brophy & Evertson, 
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1976; Good & Grouws, 1975; McDonald et 
al., 1975; Stallings & Kaskowitz, 1974), there 
has been a conspicuous lack of validity 
studies of the research instruments used, 
especially instruments to measure teacher 
behavior. Taking note of this, Borich 
(1977b) detailed several of the most salient 
sources of invalidity afflicting observational 
measures of teacher behavior, but could not 
provide empirical data as to the actual effect 
of these sources of invalidity on such in- 
struments. The present study undertook to 
determine the extent to which one of these 
sources of invalidity, the lack of convergent 
and discriminant validity, was present in five 
classroom observation systems. The valid- 
ity model reported by Campbell and Fiske 
(1959) was employed, which required that 
both convergent and discriminant validity 
be demonstrated. 

Convergent validity is a confirmation of 
traits (or variables or categories) by inde- 
pendent measuring methods as indicated by 
significant correlation between two methods 
(or systems) measuring the same trait. 
Discriminant validity is a requirement that 
“the correlation between different measures 
measuring the same trait exceed (a) the 
correlations obtained between that trait and 


1 See Borich (1977a, chap. 6) for the results of five 


large-scale studies that have investigated the relation- 
ship between teacher behavior and student outcome and 
especially pages 76-78 for a table of consistent and in- 
consistent findings across these studies. 
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any other trait not having method in com- 
mon and (b) the correlations between dif- 
ferent traits which happen to employ the 
same method” (Borich & Malitz, 1975). By 
determining intercorrelations among cate- 
gories ina multitrait-multimethod matrix, 
one can identify categories that pass speci- 
fied tests of convergent and discriminant 
validity. These procedures were applied to 
the following data in order to ascertain the 
construct validity of five classroom obser- 
vation systems. 


Method 


manuals, 

The five systems employed for this study were se- 
lected from Simon and Boyer’s (1970) Mirrors for Be- 
havior. The systems were (a) the Observation Schedule 


Spaulding Teacher Activity Rating Schedule (STARS; 
Spaulding, 1967), (c) Flanders's Systems of Interaction 
Analysis (Flanders, 1971), (d) CERLI Verbal-Behavior 
Classification System (CVC; Cooperative Educational 
Research Laboratory, Note 2), and (e) the Classroom 
Communication Observational System (CCO; Withal, 
Lewis, & Newel, 1961). These systems were selected 
because of their availability to the educational research 


On completion of training, system coders, using their 
Tespective systems, rated three trial videotapes of the 


same behavior. From these comparisons, two catego- 
riles Were paired across the Flanders and OScAR 5 sys- 
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Table 1 
Coder Reliability Before and During the 


Study* 
ex Tomar e ELCHE A MUN 


Highest prestudy Median for 
System reliability 36 tapes 
STARS 72 72 
OScAR 5 88 91 
Flanders 91 93 
Cvc 79 82 
cco 86 86 


Note. STARS = Spaulding Teacher Activity Rating Schedule; 
OScAR 5 = Observation Schedule and Record; Flanders = 
Flanders's Systems of Interaction Analysis; CVC = Cooperative 
Educational Research Laboratory, Inc., Verbal-Behavior 
Classification System; CCO = Classroom Communication 
Observational System. 

* Scott's coefficient. 


tems, three categories were paired across the STARS 
and CVC Systems, four categories were paired across the 
OScAR 5 and CCO Systems, two categories were paired 


tems, for a total of six two-system comparisons. In 
addition, there was one three-system comparison: Two 
categories were compared across Flanders, CCO, and 
OScAR 5. A description of the behaviors comprising 
these comparisons appears in the Appendix. 

In certain cases, a single variable from one system was 
Paired with several variables in another system. This 
procedure was most commonly employed when a subset 
of categories on one system was encompassed by asingle 


Once the categories to be investigated had been 
identified, ratings from each pair of coders were aver- 
aged and Pearson product-moment correlations com- 
puted. These correlations were used to construct seven 
multitrait-multimethod matrices. For each matrix, a 
heterotrait-heteromethod block was formed with those 
values in which categories may or may not coincide but 
systems differ. A heterotrait-heteromethod block is 
illustrated in Figure 1. Y 

For each matrix, a diagonal (called the validity di- 
agonal) is formed through the heterotrait-hetero- 


? The coding procedure for OScAR 5 is to record each 
teacher's utterance as it occurs without. constraining the 
coding to a specified time interval. 
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VALIDITY OF OBSERVATION SYSTEMS 


method block by the series of cells in which categories 
coincide but systems differ. Values in the validity di- 
agonal that are significantly different from zero are 
evidence for convergent validity. Discriminant validity 
must be assessed in two steps. First, each validity value 
must be compared with all values in its row and column 
in the heterotrait-heteromethod block to determine 
whether the correlations between different methods of 
measuring the same category exceed correlations be- 
tween that category and other categories not having 
method in common. Second, the heterotrait-mono- 
method triangles are examined to determine whether 
the correlation between different methods of measuring 
the same category exceeds correlations between that 
category and other categories that have method in 
common. This step is completed by comparing each 
category’s validity diagonal value with values in the 
heterotrait-monomethod triangles in which that cate- 
gory is involved. In Figure 1, the validity diagonal for 
Category A is significant at the .05 level and, therefore, 
can be taken as evidence for convergent validity. Also, 
‘Category A presents good evidence for discriminant 


validity, since its validity diagonal value exceeds all of 


the values specified in the two-step procedure outlined 
above. Category B, on the other hand, indicates neither 
convergent nor discriminant validity. 

This two-step procedure was carried out for each 
validity diagonal value in each of the seven matrices, 
and the results are entered in Tables 2-9. 


Results 


Seven matrices resulted from the process 
of comparing categories and groups of cate- 
gories across the five systems. Six of these 
matrices compared categories across two 
systems. The seventh matrix involved three 
of the systems. No matching categories 


were found to exist across any four or all five 
of the systems. A category or group of 
categories that was found to match across 
two or more systems will be referred to as a 
comparison category (CC). Twenty-three 
such CCs were created and will be referred 


System 1 System 2 
Accepts Questions Values Delves 
A B A B 


A (.76) 


B.23 (70) 


: A4$— 0 (58) 
spia -u o 


Figure 1. Simplified illustration of the validation 
model. (The validity diagonal = .43, —.01; the hetero- 
trait-heteromethod block = 43, —.01, —.10, —.12; the 
monomethod triangles = .23 and —.14, respectively. 
Interjudge reliabilities appear in parentheses.) 
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Table 2 
Matrix of Flanders’s Systems of Interaction 
Analysis Versus the Classroom 


Communication Observational System (CCO) 


Comparison Flanders cco 
category (CC) CCl CC2 CCi_ CC2 
Flanders 
CC1 
Cc2 .6420 
cco 
CC1 .7699  .2779 
cc2 5620  .6620  .3404 


Note. CC1 = gives directions and CC2 = silence or confu- 
sion. 


to by number (i.e., CC1 through CC23). The 
Appendix lists each of these 23 CCs and the 
constituent system categories that comprise 
each CC. Of the six two-system matrices, 
five contained four or fewer CCs. The 
multitrait-multimethod matrices for these 
five two-system comparisons are shown in 
Tables 2-6. The three-system matrix is 
presented in Table 7. One two-system ma- 
trix contained six CCs and is shown in Table 
8. Since this last matrix is somewhat cum- 
bersome to evaluate in its raw form, a sum- 
mary table (Table 9) was constructed to aid 
in its evaluation. 

Table 2 shows the matrix resulting from 
the matching of two categories across the 
Flanders and CCO systems. It can be noted 
that both CC1 (gives directions) and CC2 
(silence or confusion) pass the criterion for 
convergent validity, since both CCs have 
significant validity diagonal values (.7699 
and .6620, respectively; r.os = 325, df = 35). 
Since both CCs pass the test for convergent 
validity, they may be examined for discri- 
minant validity. It will be recalled that de- 
termining discriminant validity is a two-step 
process. The first step involves comparisons 
of each CC’s validity diagonal value with the 
other values in its row and column in the 
heterotrait-heteromethod block. The sec- 
ond step requires comparison of the validity 
diagonal value for each CC with values in the 
heterotrait-monomethod triangles. For 
both CC1 and CC2, the validity value ex- 
ceeds the heterotrait-heteromethod values. 
Thus, both CCs meet the first criterion for 
discriminant validity, since in both cases, the 
correlation between different methods of 
measuring the same behavior exceeds cor- 
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Table 3 3 
Matrix of Flanders’s Systems of Interaction Analysis Versus Observation Schedule and Record 
(OScAR 5) 
Comparison Flanders OScAR 5 
category (CC) CCc3 CC4 CC5 cc3 CC4 CC5 
Flanders 
cca 
CC4 —.2247 
CC5 —.0684 .1369 
OScAR 5 
€c3 .8808 —.1399 .0399 
CC4 —.1268 8571 2904 —.1331 
CC5 2210 —.1297 .0861 .1401 —.0384 


Note. CC3 = accepts feelings, praises, encourages; CC4 = criticizes or justifies authority; CC5 = student talk-initiation. 


relations between that category and other 
categories not having method in common. 
In addition, the validity value exceeds the 
heterotrait-monomethod values. In other 
words, the correlation between different 
methods of measuring the same behavior 
exceeds correlations between that category 
and other categories having method in 
common—the second step. In summary, 
CC1 and CC2 pass all tests for convergent 
and discriminant validity. 

Table 3 contains the results of matching 
categories across the Flanders and OScAR 
5 systems. Three CCs resulted from this 
comparison: CC3 (accepts feelings), CC4 
(criticizes), and CC5 (student talk-initia- 
tion). Examination of the validity diagonal 
reveals evidence for convergent validity for 
CC3 and CC4, since their values are signifi- 
cant. CC5 has a nonsignificant validity 
value and, therefore, need not be examined 

_for discriminant validity. CC3 and CC4 
pass both the first and second steps for 


Table 4 


discriminant validity, since their validity 
diagonal values exceed all relevant values in 
both the heterotrait-heteromethod block 
and in the heterotrait-monomethod trian- 
gles. Thus, CC3 and CC4 pass all tests for 
convergent and discriminant validity. CC5 
lacks evidence for convergent validity, and 
therefore its discriminant validity need not 
be examined. 

Table 4 shows the three CCs (CC6 [asks 
for feelings], CC7 [gives feelings], and CC8 
[disagrees or disapproves]) resulting from 
comparison of the CVC and STARS systems. 
None of these three CCs have significant 
validity diagonal values and therefore lack 
evidence of convergent validity. Discrimi- 
nant validity is also necessarily lacking and 
therefore need not be examined. 

Table 5 indicates that four CCs (CC15: 
[asks questions], CC16 [gives suggestions], 
CC17 [gives directions], and CC19 [per- 
functory agreement/disagreement]) resulted 
from comparison of the CCO and OScAR 5 


Matrix of the Cooperative Educational Research Laboratory, Inc., Verbal-Behavior 
Classification System (CVC) Versus the Spaulding Teacher Activity Rating Schedule (STARS) 


Comparison CVC STARS 
category (CC) CC6 CC7 ccs CC6 CC? CC8 
Cvc 
CC6 
e .7470 
1478 .0072 
STARS 
CC6 .0420 .0279 —.2484 
CC7 .0560 1766 —.1525 .0121 
C€C8 —.1033 —.1476 .1618 —.1385 .1730 


Note. CC6 = asks for feelings; CC7 = gives feelings; CG8 = disagrees or disapproves. 
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Table 5 
Matrix of the Classroom Communication Observational System (CCO) Versus the Observation 


d 


Schedule and Record (OScAR 5) 


Comparison 


category cco OScAR 5 
(CC) CC15 CC16 CC17 CC19 CC15 CC16 cci CC19 
cco 

CC15 

CC16 —.1576 

CC17 —.0101 1420 

CC19 .2797 —.0773 .1237 
OScAR 5 

CC15 .4480 .0130 .0973 .0410 

CC16 —.3171 1445 .0030 — —.1983 —.0112 

CC17 —.2213 .4642 —.0643 .0733 .2283 .1824 

CC19 .5054 —.2953 —.1057 4370 .2738 —.2061 1677 


Note. CC15 = asks questions; CC16 = gives suggestions; CC17 = gives directions; CC19 = perfunctory agreement/disagreement. 


The mean for CCO's CC18 = 0.00. 


systems. Examination of the validity diag- When Flanders, CCO, and OScAR 5 sys- 
$  onal values indicates that only CC15 and tems were compared, two CCs were found. 
CC19 have significant values. However, The results of the comparison of CC22 (ac- 
both CC15 and CC19 fail the first stepinthe cepts feelings, praises, encourages) and CC23 
assessment of discriminant validity, since (criticizes or justifies authority) are pre- 
both are exceeded by the heterotrait—het- sented in Table 7. Analysis of a three-sys- 
eromethod value of .5054. Whilethevalid- tem matrix proceeds in exactly the same 
^ ity values for these categories pass thesec- manner as the analyses of a two-system 


dæ for convergent validity, 


ond test by exceeding all values in the het- 
erotrait-monomethod triangle, they do not 
pass the first test for discriminant validity. 
Thus, while CC15 and CC19 show evidence 
they show mixed 
results for discriminant validity. 

Table 6 shows the results of the compari- 
son of STARS with OScAR 5. Two CCs 
(CC20 [restructuring] and CC21 [telling, 
informing]) resulted from this comparison, 


matrix, except that instead of one validity 
diagonal to examine, there are now three 
(corresponding to the three system pairings). 
Examination of the three validity diagonals 
indicates all values are significant. Fur- 
thermore, it can be noted that each of these 
values exceeds the relevant values in the 
heterotrait-heteromethod blocks and in the 
heterotrait-monomethod blocks. Thus, 


both CC22 and CC23 pass all tests for con- 


i and both pass all tests for convergent and vergent and discriminant validity in the 
| discriminant validity. Flanders, CCO, and OScAR 5 comparison. 

| Lastly, comparison of the CVC and 
| Table 6 OScAR 5 systems resulted in the creation of 
I Matrix of the Spaulding Teacher Activity six CCs (CC9, informs [facts]; CC10, informs 
dem Rating Schedule (STARS) Versus the [rules]; CC11, accepts facts and interpreta- 
| Observation Schedule and Record (OScAR 2) - tions; CC12, accepts feelings and plans; 
| Comparison STARS OScAR 5 CC13, rejects facts and interpretations; and 
category (CC)  CC20  CCZ Gc cca CC14, rejects feelings and plans). The cor- 

relation matrix for these comparisons is 

ped shown in Table 8, while a summary table of 

CC21 — 4358 these data is presented in Table9. Table9 

OScAR 5 shows the validity diagonal value for each 

CC20 .6165 —.3520 CC. In addition, data are presented per- 

COM =.3314 8598 —2478 taining to each CCs discriminant validity 

Note. CC20 = restructuring and CC21 = telling, informing. (the highest value in the relevant parts of the 

LI 
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Matrix of Flanders’s Systems of Interaction Analysis Versus the Classroom Communication 


Observational System (CCO) Versus the Observation Schedule and Record (OScAR 5) 


Comparison Flanders cco OScAR 5 
category (CC) CC22 CC23 CC22 CC23 CC22 CC23 
Flanders 
CC22 
CC23 2247 
cco 
CC22 5743 —.3616 
CC23 —.2171 T182 —.2757 
OScAR 5 
CC22 .8808 .1399 5441 —.1840 
CC23 —.1268 8571 —.2923 6811 —.1331 


Note. CC22 = accepts feelings, praises, encourages and CC23 = criticizes or justifies authority. 


heteromethod and monomethod blocks, and 
the number of times the validity value is 
exceeded in each of these blocks). Exami- 
nation of this table reveals that three CCs 
(CC10, CC12, and CC14) have nonsignificant 
validity values and, therefore, lack evidence 
for convergent validity. Of the remaining 
three CCs, all show good evidence of discri- 
minant validity, since their validity values 
are exceeded by none of the relevant het- 
eromethod or monomethod values. 
Comparison of the five teacher observa- 
tion systems employed in this study pro- 
duced 23 CCs. Twenty-one of these CCs 
were involved in two-system comparisons 
and two in a three-system comparison. Of 


Table 8 


the 23 CCs, 18 (57%) showed evidence of 
convergent validity. Eleven of the 23 CCs 
(48%) passed tests for both convergent and 
discriminant validity. Thus, of the CCs that 
were analyzed, about half conformed to 
Campbell and Fiske’s (1959) criteria for 
convergent and discriminant validity. 


Discussion 


The purpose of this research has been to 
evaluate the convergent and discriminant 
validity of five classroom interaction systems 
that either have been used in studies relating 
teacher behavior to pupil outcome or are ! 
reasonable representations of the types of 


Matrix of the Cooperative Educational Research Laboratory, Inc., Verbal-Behavior 
Classification System (CVC) Versus the Observation Schedule and Record (OScAR 5) 


Comparison 

category CVC OScAR 5 

(CC) CC9 CC10 CC11 CCI2 CCI3 CCi4 CC9 CCIÓ cCll CCi2 CCI3 COM 
CVC 

Cc9 

CC10 .0832 

CCi1 5459 .1248 

CC12 -0858 5504 .2034 

mM 1452 .0600  .4429 —.1568 

—.2915 —.0350 —. 4 

SAIS 3095 .0805 .2508 

CC9 -6746 —1078 .1829 —.0350  .0058 —.2446 

CC10 -3622 .1758 —.0938 .3550 —.0782 -.1198 1043 

CCl1 -0236 —.1299  .6402 —.4387 1818 —3078 .0494 —.2702 

CC12 —0531 .0866 —.2327 .3088 —.1442 —.1466 —.1845 .2386 —.3163 

CC13 -1696 .0200 .3779 —.2166  .6440 .0283 .0684 .0148  .2488 —.2880 
CC14 2008 .2917 —.0933 .3293 .0653 .1928 —.1467 .4183 —.5148 .2790 —.0209 


NE oe = informs (facts); CC10 = informs (rules); CC11 = accepts facts and interpretations; CC12 = accepts feelings and — | 
plans; CC13 = rejects facts and interpretations; CC14 = rejects feelings and plans. 
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Table 9 
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Summary Data for the Cooperative Educational Research Laboratory, Inc., Verbal-Behavior 
he Observation Schedule and Record (OScAR 5) 


Classification System. (CVC) Versus t 


Comparison Validity Highest No. higher Highest No. higher 

category diagonal value in than validity value in than validity 

(CC) value heteromethod value monomethod value 
cca .6146* .3622 0 .5459 0 
CC10 .1758 .3622 0 .5504 4 
CC11 .6402* —.4387 0 .5459 0 
CC12 .3088 —.4387 3 5504 2 
CC13 .6440* 3779 0 4429 0 
CC14 .1928 .3293 5 —.5148 6 


Note. CC9 = informs (facts); CC10 = informs (rules); CC11 = accepts facts and interpretations; CC12 = accepts feelings and 
plans; CC13 = rejects facts and interpretations; CC14 = rejects feelings and plans. 


* p «.05. 


systems that have been used in this research. 
Tt was the investigators" belief that at least 
one explanation for the large number of in- 
consistent and “null” findings in process— 
product studies was that the instrumenta- 
tion used to measure classroom behavior, 
particularly teacher process behavior, may 
not exhibit construct validity. The findings 
of this study tend to support this belief, since 
about half of the teacher process behaviors 
investigated failed to pass tests for conver- 
gent and discriminant validity. While no 
reference to specific process-product studies 
need be made, the investigators suggest that 
many such studies have measured behaviors 
with the same or similar forms of instru- 
mentation at the same or similar levels of 
inference as the instruments and categories 
employed in this study. 

Based on the results of studying five 
classroom observation systems, the impli- 
cations are not particularly encouraging for 
researchers who choose to measure class- 
room interaction with existing instrumen- 
tation. One can infer that of the hundreds 
of other observational coding instruments 
that have been developed, many must con- 
tain categories that do not meet the stan- 
dards of convergent and discriminant va- 
lidity proposed in this study. Process- 
product researchers as well as those who at- 
tempt to aggregate and accumulate the 
findings of process-product research might 
well be advised to exercise caution drawing 
conclusions from studies that use classroom 
observation systems for which the mea- 
surement technique itself accounts for 
the behavior being 
measured (lack discriminant validity) or that 


incorporate behaviors that, when measured 
by different systems, fail to correlate (lack 
convergent validity). 

As a result of using the multitrait-multi- 
method technique to evaluate validity, two 
types of instrument problems became ap- 
parent. The first concerns the redundancy 
or overlap of behavioral measures within 
systems, reducing a construct’s chances of 
exhibiting discriminant validity. While 
complete independence of the behaviors 
measured within a system is not expected, 
significant interrelationships among be- 
havioral categories substantially reduce the 
chances of these categories passing tests for 
discriminant validity. In several instances 
in this study, interrelationships among be- 
haviors precluded any chance of a category 
exhibiting discriminant validity. For ex- 
ample, in the first heterotrait-monomethod 
triangle in Table 4, CC6 and CC7 correlated 
7470, the highest correlation in the matrix. 
Note that in this instance even if the validity 
diagonal values had been beyond signifi- 
cance (r.o5 = .325), they probably would not 
have surpassed the heterotrait-monomethod 
value, and thus, the category's discriminant 
validity would still have been rated “poor.” 
When pilot testing classroom observation 
instruments, authors might delete highly 
redundant categories or attempt to reduce 
the significant interrelationships among 
such categories by providing more specific 
operational definitions for the behaviors 
being measured. 

The second instrument problem that 
came to light with the multitrait-multi- 
method technique was the relatively large 
number (43%) of teacher behaviors that 
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failed to correlate significantly with behav- 
lors assessed on other instruments with 
which they were matched, that is, lacked 
convergent validity. While some of these 
numbers might be accounted for by the 
inexactness of the matching process in 
applying the multitrait-multimethod tech- 
nique to classroom observation instruments, 
in general the matches that were made in 
this study may be considered conservative 
and were supported by the same or similar 
operational definitions. Thus, some of the 
seemingly similar constructs of the type re- 
viewers of process-product studies relate 
across studies when aggregating process- 
product findings were found, in fact, not to 
be similar in this study. One might account 
for this finding by method variance that 
confounded the measurement of approxi- 
mately half the behaviors in this study, 
vague operational definitions of behaviors 
when actually interpreted by coders, and/or 
intrinsic coder differences. This lack of 
convergent validity suggests that the de- 
scriptive titles of categories and behavioral 
constructs employed in some observational 
coding systems may not adequately repre- 
sent the behavior they purport to measure. 
Since this is a between-systems problem, 
authors might turn to standard theoretically 
based operationalizations of their constructs 
when developing new systems, 

Evaluating convergent and discriminant 
validity with the multitrait-multimethod 
procedure is one approach to assessing the 
validity of an instrument. "The purpose of 
the remaining portion of this discussion will 
be to bring to light several nuances encoun- 
tered in its use in this study and the theo- 
retical assumptions underpinning the tech- 
nique. 

Campbell and Fiske (1959) introduced the 
technique with examples drawn primarily 
from the literature in personality and in- 
dustrial psychology. In these examples, 
authors attempted to assess various traits 
(e.g., assertiveness, cheerfulness, poise, 
popularity, and intelligence) using two or 
more methods (e.g., self rating vs. peer rating 
and paper-and-pencil test vs. direct obser- 
vation). Thus, the authors of these studies 
devised different methods for measuring the 
same variables. 

Our use of the technique was somewhat 
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different. Rather than use different meth- 
ods to measure the same variables, we took 


existing methods that measured a variety of 


variables and tried to specify the variables 
that were measured in common across 
methods. Thus, our methods and variables 
were not tailor-made to our research situa- 
tion. Instead, they were fitted to our re- 
search needs. 

Our approach was to treat the overlap 


across systems as the basis for constructing 2 


comparisons that could be used to determine 
the convergent and discriminant validity of 
behavioral categories within systems. Our 
success at this was dependent on our ability 
to create fair and accurate matches across 
systems that defined behavior categories 
differently. Matching in this case is not the 
same as in the studies cited by Campbell and 
Fiske, where different methods were de- 
signed to measure the exact same variables. 
Thus, the applicability of the multitrait- 
multimethod technique to a particular va- 
lidity problem depends on the redundance 
of categories and operational definitions 
across instruments and the conciseness with 
which matches can be made. 


In addition to semantic differences among 


category definitions, differences sometimes 
exist between the way a category is defined 
in a manual and the way it is actually used by: 
coders. Ifa system is to be used reliably by 


raters, categories must be clearly opera- | 


tionalized for the coders. This, of course, is 
the purpose of training. However, it is not 
possible to include in a definition in a man- 
ual all of the information necessary to code 
a particular category reliably. Often coders 
find it necessary to create “ground rules” to 
delimit the boundaries of particular cate- 
gories. Coders, for example, might have 
difficulty distinguishing between the cate- 
gories "teacher accepts” and “teacher ap- 
proves.” To distinguish between these be- 
haviors, they may create certain ground rules 
for coding. For example, coders might de- 
cide that if the teacher uses an exclamation 
such as “Oh!” or “My!” in regard to a stu- 
dent’s comment, the proper code is “teacher 
approves”; otherwise, the code is “teacher 
accepts.” From our experience in this study, 
ground rules like this are not uncommon, 
and while they do not seem to distort the 
meaning of the categories, they delimit their 
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meaning in a way that might not be apparent 
to a reader of the manual. Furthermore, it 
is not uncommon for coders or system au- 
thors to modify ground rules to fit different 
classroom situations. Since the actual op- 
erationalizations of categories can change 
from coder to coder or from study to study 
(depending on the classroom situation being 
coded), the manual definitions, besides being 
somewhat ambiguous, are at times only 
guidelines to the meaning of the catego- 
ries. 

Certain theoretical considerations are also 
of interest. One of these concerns the in- 
dependence of methods of measurement. 
The multitrait-multimethod teclinique is 
based on the use of independent methods of 
measuring the same variables. Although 
Campbell and Fiske note that independence 
is a matter of degree, Calkins, Malitz, Na- 
talicio, and Mote (Note 3) point out that the 
“determination of validity is enhanced by 
the inclusion of methods of measurement 
which are as diverse as possible” (p. 2). The 
reason for this is that the “determination of 
convergent and discriminant validity for a 
set of variables can be obfuscated if all traits 
have been quantified by the same method of 
measurement. If all the traits were quan- 
tified by the same method, high correlations 
could result because all the variables share 
‘method variance’ ” (Calkins et al., Note 3, 
pp. 1-2). The extreme case of noninde- 
pendence is where exactly the same method 
is used to measure the variables. In this 
case, the values in the validity diagonal are 
merely reliability values. Since high reli- 
ability can be obtained in the absence of 
validity, this extreme case would not address 
the issue of validity. Thus, to the extent 
that the methods are independent, the 
multitrait-multimethod technique will yield 
useful validity data. Given that the inde- 
pendence of different classroom observation 
systems may be difficult to assess, one 
should include in studies using the multi- 
trait-multimethod procedure a variety of 
methods in order to assure the maximum 
amount of independence among measure- 
ment instruments. 

This study tested the applicability of the 
multitrait-multimethod validation proce- 
dure to classroom observation instruments. 
The study brought to light two types of in- 
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strument problems and several assumptions 
of the technique that define the context in 
which multitrait-multimethod is most ap- 
propriate. It was found that the applica- 
bility of the multitrait-multimethod pro- 
cedure can be expected to vary across và- 
lidity studies, depending on two primary 
considerations: (a) the conciseness in which 
behavioral categories can be matched across 
classroom observation systems and (b) the 
degree to which the investigator can include 
comparison instruments in the validity study 
of sufficient variety to assure a reasonable 
degree of independence among methods. 
To the extent that these considerations are 
addressed, the validation procedures em- 
ployed in this study were found to constitute 
a potentially economical and practical model 
for examining the construct. validity of other 
classroom observation systems. 
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comparison Category numbers in 
(CC) General category description respective system 
Flanders/CCO (see Table 2) 
CC1 Gives directions 6/7 
Ccc2 Silence or confusion 10/13 
Flanders/OScAR (see Table 3) 
cca Accepts feelings, praises, en- 
courages 1, 2/2, 12, 22, 32, 42, 52, 62, 72, 82, 92 
CC4 Criticizes or justifies authori- 
ty 7/6, 16, 26, 36, 46, 56, 66, 76, 86, 96 
CC5 Student talk-initiation 9/10, 20, 30, 40 
CVC/STARS (see Table 4) 
cce Asks for feelings 3/10b 
CC7 Gives feelings 7/10a 
CC8 Disagrees or disapproves 13, 14, 15, 16/1b, 1c, 1d 
CVC/OScAR (see Table 8) 
CC9 Informs (facts) 5, 6/3, 23 
CC10* Informs (rules) 8/4,5,7 
CCi1 Accepts facts and interpreta- 
tions 9, 10/22, 32, 33, 42, 43, 52, 53, 62, 63, 72, 73, 82, 83, 92, 93 
CC12 Accepts feelings and plans 11, 12/2, 12, 13, 19 
CC13 Rejects facts and interpreta- 
tions 13, 14/26, 35, 36, 45, 46, 55, 56, 65, 66, 75, 76, 85, 86, 95, 96 
CC14 Rejects feelings and plans 15, 16/6, 15, 16, 17 
CCO/OScAR* (see Table 5) 
CC15 Asks questions 1, 2, 3/8, 50, 60, 70, 80, 90 
CC16 Gives suggestions 6/9 
CC17 Gives directions 7/4, 5, 7, 17, 19 
CC19 Perfunctory agreement/dis- 
agreement 14/14, 34, 44, 54, 64, 74, 84, 94 
STARS/OSCcAR (see Table 6) 
CC20 Restructuring 2b/7 
CC21 Telling, informing Ta, Tb/3 
Flanders/CCO/OScAR (see Table 7) 
CC22 Accepts feelings, Praises, en- 
courages 1, 2/10/2, 12, 22, 32, 42, 52, 62, 72, 82, 92 


CC23 Criticizes or justifies authori- 

iy 7/12/6, 16, 26, 36, 46, 56, 66, 76, 86, 96 
nues es ee Systems of Interaction Analysis; CCO = Classroom Communication Observational System; OScAR 
System; STARS = Spauldi ie " S 


* The mean for CCO's CC18 = 0.00, 
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Ninth-grade students from Western and Middle-Eastern ethnic backgrounds 
(N = 1,033, in 30 classrooms) responded to a multiscale attitude questionnaire 
at the beginning and end of their first year in desegregated high schools. West- 
ern students’ assessment of their own ethnic group declined over the course of 
the year, while students from the Middle-Eastern group expressed a more pos- 
itive attitude toward and greater acceptance of members of their own ethnic 
group at the conclusion of the year than at the start. Students from both 
groups continued to view persons from the Western group as superior to those 
of Middle-Eastern background at the conclusion of the year. Thus, each ethnic 


group registered change in its members’ relation to their own group rather 
than toward the other group. Contrary to current claims, neither the extent of 
previous ethnic contact nor the degree of ethnic mix in the desegregated class- 
room was found to affect attitudinal outcomes of desegregation. 


Israel recently established comprehen- 
sive high and junior high schools. Ethnically 
heterogeneous populations were formed by 
drawing students from neighborhood ele- 
mentary schools with ethnically more ho- 
mogeneous populations. These desegregated 
secondary schools are presently changing the 
pattern of de facto segregation, which pre- 
vailed in many schools heretofore, between 
Israel’s two Jewish ethnic groups, namely, 
Jews from the Moslem countries of the 
Middle-East and those from Western 
countries of Europe, the Union of South 
Africa, and North and South America. 

Social scientists agree that school deseg- 
regation per se does not lead automatically 
to the improvement of relationships and 
attitudes between members of different 
ethnic groups (Allport, 1954; Amir, 1969, 
1976; Cook, 1962; St. John, 1975). What, 
then, are the conditions under which dese- 
gregation exerts a positive influence on stu- 
dents? Before an adequate reply to this 
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question can be forthcoming, several im- 
portant social and environmental variables 
affecting desegregation outcomes must be 
explored, such as: To what extent were the 
subjects exposed to interethnic contact prior 
to their participation in a given study? To 
what extent was each ethnic group repre- 
sented in the integrated school? What was 
the academic and social status of the stu- 
dents from each group involved in the dese- 
gregated setting? 1 
Furthermore, most research heretofore 
investigated only one of the following four 
critical aspects of attitudes relevant to the 
ethnically desegregated setting: The high- 
status group’s attitudes (a) toward the low- 
status group and (b) toward itself; the low- 
status group’s attitudes (c) toward the 
high-status group and (d) toward itself. All 
four of these aspects should be encompassed 
because different effects of desegregation are 
possible in each one independently of the 
others. Policy decisions can be based on 
misleading information if all aspects of the 
problem are not given due consideration. 
The present study assessed the ethnic 
attitudes and preferences of students from 
both ethnic groups in Israel attending 
desegregated high schools and from all four 
points of view described above. Moreover, 
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the attitudinal data were analyzed as a 
function of several variables judged to be 
potential sources of influence on the out- 
come of ethnic contact and integration. The 
variables on which this study focused were 
social and academic status, sex, ethnic rep- 
resentation in the desegregated classroom, 
and previous ethnic contact. 

The importance of social status in the 
ethnic contact situation was emphasized by 
Allport (1954). Kramer (1950) noted that 
personal status characteristics relevant to 
the contact situation, as well as features of 
general social status, determine the outcome 
of intergroup contact. The present study 
included measures of general status in soci- 
ety, namely, academic achievement and in- 
telligence, as well as measures of status 
variables operating within particular class- 
rooms, such as sociometric status. 

Regarding the attitudinal correlates of sex 

in desegregated schools, research in the 
United States suggests that girls from the 
lower-status ethnic group tend to encounter 
greater difficulties in adjusting to the inte- 
grated setting than do their male ethnic 
peers (Carithers, 1970; Martinez-Monfort, 
1971; St. John, 1975). What differences, if 
any, exist between the sexes regarding their 
attitudes toward other ethnic groups or 
toward themselves as members of particular 
ethnic groups? The present investigation 
studied changes in the way both sexes eval- 
uated the other ethnic group as a function of 
desegregation. 

Finally, the degree of ethnic representa- 
tion in the composition of the classroom is 
generally considered to be a potential influ- 
ence on the effects of school desegregation 
(Coleman et al., 1966; Pettigrew & Pajonas, 
Note 1). Previous investigators concentrated 
on ethnic representation in the classroom in 
terms of its relationship to achievement 
outcomes. Studies of ethnic composition of 
classrooms and its effect on ethnic attitudes 
and attitude change are virtually unavail- 
able. This study considers the influence of 
ethnic mix on attitudes and preferences at 
various levels of ethnic representation in the 
desegregated classroom. The extent of ethnic 
contact experienced by the students prior to 
their entry into high school was also con- 
sidered. Attitude change could manifest it- 
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self only among students entering a deseg- 
regated setting for the first time, whilethose 
with prior extensive exposure to members of 
other ethnic groups may not reveal any 
change. 


Method 


Subjects 


Ninth-grade students (N = 1,033) from 30 classrooms 
in seven high schools, located in different parts of Israel, 
participated in this study. The schools ranged from 
vocational schools serving primarily students of low 
social class, to comprehensive high schools whose stu- 
dent bodies came from all social classes, to high schools 
attended primarily by students with records of high — 
academic achievement. Five schools were coeducational; 
two vocational schools were for boys only. This range 
of schools was calculated to reflect the major strata 
found in Israel’s secondary educational system. Com- 
parison of a standard achievement measure from the 
sample in this study with scores on this measure of the 
entire Israel population indicates that our sample can 
be regarded as representative of the population in this 
respect. There were 419 students from families of 
Western background, 614 students from families of 
Middle-Eastern background. The classrooms were 
comprised of students entering high school from a va- 
riety of elementary schools and, therefore, they had no 
common past as groups prior to their entry into the 
ninth grade. 


Measures and Procedures 


A questionnaire was constructed, consisting of several 
different kinds of scales, to assess the ethnic perceptions 
and preferences of Middle-Eastern and Western stu- 
dents. In light of the inconsistent findings reported with 
different measures of ethnic attitudes (Brand, Ruiz, & 
Padilla, 1974), a multiscale questionnaire was used to 
provide a broad basis for evaluating the validity of the 
findings. Five scales were included, as follows: (a) So- 
ciometric questions were used to evaluate ethnic pref- 
erences in close-to-real behavior situations. (b) A 
“Participation in Activities” scale was conceived as 
tapping the behavioral dimension of attitudes. (c) 
Questions in the format of the semantic differential 
were intended to elicit the cognitive-evaluative di- 
mension of attitudes. (d) Questions regarding the degree 
of similarity between pairs of ethnic groups afforded 
subjects the opportunity to express their own judgments 
about the differences between ethnic groups. (e) A scale 
to assess identification with one's own ethnic group. 

Factor analyses were performed on data obtained 
from 237 students with an experimental version of the 
above scales. A factor analysis of 15 sociometric ques- 
tions revealed that they loaded on two factors: (a) 
questions about personal preferences and friendships, 
such as “Write the names of four of your best friends in 
this class" and (b) questions to identify the "stars" or 
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most influential figures in the class, for example, “Write 
the names of four students in this class who are most 
often able to get other students to carry out their sug- 
gestions.” 

Each student received two sociometric scores, one for 
each of the two factors listed above. The probability of 
any student choosing someone from his or her or from 
the other ethnic group was calculated separately for 
each classroom. If the subject’s preferences coincided 
with the ethnic representation in his class, he received 
a score of 500. A score above 500 reflects a tendency 
to choose persons of the same ethnic group with greater 
frequency then expected on the basis of probability. 


The second scale pertains to one’s willingness to en- 
gage in activities with members of both ethnic groups. 
These questions loaded on two separate factors: one 
comprised of questions about activities with Western 
students and the other about activities with Middle- 
Eastern students. It should be noted that each of the 
three main questions in this category required the 
subject to respond on a 4-point scale (from 1, “inter- 
ested,” to 4, “not interested”) to five subquestions re- 
ferring to two Western and to three Middle-Eastern 
countries. For example: “Would you be interested in 
inviting a student to your house whose family came from 
... ? (Rumania, Iraq, United States, Yemen, Morocco).” 
Each student received two scores in this category: the 
first was the average score for activities with peers from 
Western countries and the second was his average score 
for activities with students from Middle-Eastern 
background. 


The third scale consisted of four pairs of antonyms 
in the format of the semantic differential (good-bad; 
foolish-smart; clean-dirty; harmful-helpful) presented 
in a 7-point scale for assessing persons whose families 
came to Israel from the above-named countries. Factor 
analysis yielded two factors: one encompassing the re- 
sponses regarding persons of Middle-Eastern back- 
ground and the other including the responses assessing 
persons of Western background. The scores were an 
average of the subjects' responses to all the questions 
um cun index, ranging from 1 (positive) to 7 (nega- 

Ave). 

The fourth scale consisted of six questions about the 
degree of similarity between persons from six given pairs 
of countries, two Western by three Middle-Eastern 
countries. For example, “How similar are children 
whose families come from Rumania and Iraq?” These 
judgments were interpreted as an indirect indicator of 
ethnic preferences. Judgments of more similarity be- 
tween persons from countries representing different 
ethnic groups were viewed as reflecting more ethnic 
acceptance than were judgments of less similarity. In 
each paired comparison, one country represen’ 
Western group, while the second country belonged to 
the Middle-Eastern group. Responses were indicated 
on a 5-point scale from 1 = greatest similarity to 5 = 
greatest dissimilarity. All six comparisons loaded highly 
on a single factor. The score consisted of the average 
response to all six questions. 

"The final measure was a set of six questions, based on 
Peres (1976), to assess the degree of identification with 
one’s own ethnic group. Factor analysis revealed that 
these questions loaded on two factors—two questions 
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about attraction to one’s group, such as: “Tf you could 
be born again, would you want to be a member of your 
present ethnic group? ". and four questions regarding 
the salience of one’s ethnic background in influencing 
one's thoughts and behavior: “Does the fact that you 
belong to your ethnic group influence what you say and 
do?” Responses were indicated on a 4-point scale, on 
which 1 = a great amount, 4 = a small amount. 

Two types of measures were employed as indicators 
of the students’ classroom status as an independent. 
variable: one measure of academic status and one of 
social status. Academic ability and achievement level 
of the subjects was obtained from their scores on a 
standardized test administered routinely by the school 
system to all prospective graduates of the eighth grade. 
This test included questions on a variety of subject- 
matter areas, as well as requiring problem solving typ- 
ical of general intelligence tests. Data obtained from this 
test made possible the division of the sample into dif- 
ferent levels of academic achievement. 

The students’ sociometric scores were employed as 
ameasure of their social status. These sociometric scores 
differed from those described above. The latter scores 
focused on the students making the choice, whereas the 
scores being presented here were based on the students 
being chosen. The scores were constructed by taking 
into account the number of choices received by each 
student in relation to class size, number of total choices, 
and other critical variables. The basic principles of the 
procedure employed are described elsewhere (Amir, 
Kovarsky, & Sharan, 1970). Factor analysis of the so- 
ciometric data yielded three factors: popularity (positive 
choices), rejection (negative choices), and isolates. 

The questionnaire was administered twice to each 
ninth-grade class: once during the third week of the 
school year, a second time during the final month (June) 
of the academic year. Data from the achievement test 
were obtained from the school authorities. 


Results 


Change in each of the dependent variables 
was examined as a function of the school and 
classroom in which the student was located. 
No effect was found for either of these two 
variables using both analysis of variance and 
analysis of covariance techniques. Conse- 
quently, all the following analyses were 
performed on subjects across schools and 
classrooms. 

Five major variables, considered to be 
possible sources of the changes which oc- 


1 We are indebted to M. Snyder, Scientific Director 
of the Bar-Ilan University Computer Center, for de- 
veloping the mathematical formula by which the so- 
ciometric data were analyzed. Details were presented 
at the Israel National Statistics Conference, Haifa, June 
1974. 
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curred, were analyzed in several analyses of 
variance. These variables were as follows: (a) 
previous ethnic contact, as measured by the 
ethnic composition of the eighth-grade 
classrooms in the elementary schools from 
which the sample participating in this study 
had been graduated; (b) present ethnic 
contact, as measured by ethnic representa- 
tion in the ninth-grade classrooms in high 
school; (c) general academic status, mea- 
sured by a standardized achievement test; 
(d) sociometric status; and (e) sex. These 
analyses of variance, with pretest scores 
serving as covariate, were performed on the 
data from all nine indices for each of the two 
ethnic groups separately. 

Surprisingly, analyses of previous ethnic 
contact and present ethnic representation in 
the classroom failed to yield significant re- 
sults. Thus, remarkably, different propor- 
tions of ethnic mix, either in the elementary 
school classrooms where students studied 
before entering high school or in their 
present class in high school, did not exert 
differential effects on change in ethnic atti- 
tudes and preferences, 

Findings in this study fall into three main 
categories: change in ethnic attitudes and 
preferences as a function of (a) ethnic 
membership, (b) sexual identity, and (c) 
academic and sociometric status. Only sig- 
nificant findings are reported here. 


Attitude Change as a Function of Ethnic 
Membership 


A multivariate analysis (Hotelling’s T) 
was conducted to test for the overall differ- 
ence between the means of the pretest and 
posttest measures. One such analysis was 
performed on data obtained from Middle- 
Eastern subjects; a second such analysis was 
performed on data obtained from Western 
subjects. Both tests yielded significant 
findings (for Middle-Eastern subjects: T = 
32.81, F = 115.16, p < .001; for Western 
subjects: T = 30.21, F = 95.91, p < .001). 

The measures of ethnic attitudes em- 
ployed here were largely independent of each 
other, since they were constructed by factor 
analysis. In light of this fact, several t tests? 
were carried out on the pretest and posttest 
data from each of the variables. Table 1 re- 
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veals that Western students expressed less 
of an “in-group” orientation on several 
measures at the end of the year in desegre- 
gated classrooms (Measures 1, 3, 4, 5, 6), 
while students of Middle-Eastern back- 
ground enhanced their perception and ac- 
ceptance of their own ethnic peers over what 
they had stated at the beginning of the year 
(Measures 1, 2, 5, 6, 7). These changes were 
not accompanied by an increased rejection 
of the Western by the Middle-Eastern stu- 
dents. Western students continued to enjoy 
preferential status in their own eyes as well 
as in those of the Middle-Eastern stu- 
dents. 


Attitude Change as a Function of Sex 


Differential changes in interethnic atti- 
tudes as a function of sex emerged among 
Western students only (see Table 2). West- 
ern boys became more positively inclined 
toward their Middle-Eastern peers on sev- 
eral attitudinal measures after their year in 
desegregated classrooms. Western girls, on 
the other hand, responded to these questions 
in a more negative fashion at the end of the 
year than they had earlier. Indices assessing 
intraethnic attitudes revealed that Mid- 
dle-Eastern boys and girls displayed greater 
attraction to their own group, even though 
boys changed more than did the girls. Mid- 
dle-Eastern boys also increased their at- 
tention to their ethnic identity (salience), 
while girls were lower on this measure at the 
end of the year. 


Attitude Change as a Function of 
Academic and Social Status 


Middle-Fastern students of low academic 
status evidenced positive change in their 
evaluations of traits of Western persons, 
while those of high academic status evalu- 
ated Western persons in a less favorable light 
at the conclusion of the year (see Table 3). 


? [n addition to the statistical analyses presented in 
Table 1, the data obtained with each of the nine indices 
were analyzed by a two-way analysis of variance for 
repeated measures (Ethnic Group X Pretest/Posttest). 
Both t and F statistics yielded identical findings. For 
the sake of simplicity, t results are reported here. 


" 
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Table 1 
Mean Pretest and Posttest Scores and Significance Levels for Ethnic Attitudes of Western and 
Middle-Eastern Students 
Western Middle-Eastern 
Measure Pretest Posttest t Pretest —Posttest t 

1, Activity with Middle-Eastern students 2.25 232 3190% 2.26 2.13 $.15"* 
2. Middle-Eastern characteristics 3.36 3.24 2.23* 
3. Western characteristics 2.45 2.61 3.24** 
4. Middle-Eastern/W estern similarity 4.02 3.90 2.81** 
5, Sociometry: friends 629.45 602.72 2.31* 476.30 521,52 4.38** 
6. Sociometry: stárs 660.49 622.64 PITE es 414.75 448.07 2.80** 

2.20 2.07 2.55* 


7. Ethnic identification: attraction 


Note. On all measures except for the two sociometric indices, high scores indicate negative attitudes, low scores indicate positive 
attitudes. On the sociometric indices, scores above 500 indicate a tendency to select same-group persons, scores below 500 indicate 


a tendency to select persons of the other group. 
*p <.05. 
**p «O1. 


On the other hand, Middle-Eastern students 
of both academic levels viewed their own 
group more favorably at the conclusion of 
the year, although those of high academic 
status disclosed greater change toward pos- 
itive evaluation of their own group's personal 
traits than did their ethnic peers of low ac- 
ademic status. 

Data regarding the perception of the de- 
gree of similarity between the two groups 
also appear in Table 3. High achieving 
Middle-Eastern students perceived less of 
a difference between the two ethnic groups 
at the end of the year than they had before- 
hand, even though they had continued to 


Table 2 

Mean Difference Scores and Significance 
Levels for Attitude Change in Male and 
Female Western and Middle-Eastern 
Students 


Western Middle-Eastern 
F Boys Girls _F 


Measure Boys Girls 
Middle- 
Eastern 
character- 
istics 
Sociometry: 
stars 
Ethnic identi- 
fication: 
attraction 19 
Ethnic identi- 
fication: 
salience ll 


Note. Positive scores indicate positive change; negative scores 
indicate negative change. The higher the score, the greater the 
change. 

*p <.05. 
**p <01. 


14 —.18 5.42* 
61.96 25.33 7.87** 


10 4.74* 


—.11 7.04** 


view the two groups as being quite differ- 
ent. 

Western students of high academic status 
also changed their evaluation of the traits of 
persons from their own ethnic group. They 
evaluated Western persons less favorably 
than they had done at the beginning of the 
year. Western students of low academic 
status did not change their evaluations. 

Out of 54 analyses of variance of data re- 
garding sociometric status, 6 emerged as 
statistically significant. All of the findings 
point to the more popular and less rejected 
students exhibiting greater positive change 
in their ethnic attitudes and preferences. 


Table 3 3 e 
Mean Difference Scores and Significance 

Levels for Attitude Change of Western and 
Middle-Eastern Students on Two Levels of 


Academic Status 
Western Middle-Eastern 


A9 Ao ct mm 
High Low w 


1g! 
Measure status status F status status PF 


istics 45  .10 3.94* 


—.23 —.06 842** —12  .08 4.69* 


Western 


similarity .08 —.02 5.13* 


Note. Positive scores indicate positive change; negative scores 
indicate negative change. The higher the score, the greater the 
change. 

*p «.05. 
**p <01. 
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The Middle-Eastern Jewish group in Is- 
rael, which is of relatively low social status, 
registered improvement in their perception 
and general acceptance of their own ethnic 
group after spending a year in desegregated 
ninth-grade classrooms. The higher status 
Jewish group from Western countries ex- 
pressed some decline in self-acceptance, less 
of a tendency to choose primarily members 
of their own ethnic group as friends, and 
more willingness to affiliate with Middle- 
Eastern students, even in the absence of 
cognitive-evaluative changes toward their 
Middle-Eastern peers. Thus, each ethnic 
group changed in its relation to itself, rather 
than revealing improved attitudes toward 
the other group, although these changes can 
have important implications for interethnic 
relationships. 

It should be noted that this study en- 
compassed attitudes toward one's own group 
and toward the other group of both the 
high-status and low-status ethnic groups. 
Had the investigation focused exclusively on 
one or the other ethnic group alone or on 
intergroup relations only, as is found fre- 
quently in current research rather than 
considering self as well as other group atti- 
tudes and preferences, the findings pre- 
sented here regarding changes in attitudes 
towards one's own group would not have 
come to light. This study emphasizes that 
the main attitudinal problem confronting 
the low-status group in the area of ethnic 
relations is its relation to itself, while the 
main problem of the higher status group is 
its attitude toward the lower status group. 

Noteworthy, too, is the finding that both 
ethnic groups continued to attribute superior 
qualities to and prefer members of the 
Western group. Middle-Eastern students 
evidenced no change in their high evaluation 
of their Western peers. This suggests that 
the increased self-acceptance expressed by 
the low-status ethnic group at the end of the 
year, on additudinal and behavioral mea- 
sures, did not stem from an in-group isola- 
tionism characterized by rejection of the 
high-status group. Indeed, the Western 
group retained its preeminent position in the 
ethnic attitudes of the Middle-Eastern stu- 
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dents, even while the latter improved their 
own self-image and intensified their attrac- 
tion to their own group. There is no evidence 
in these data that cross-ethnic comparisons 
in the desegregated classroom in Israel in- 
creased the threat of identity loss or loss of 
self-esteem to members of the low-status 
group as found elsewhere (Pettigrew, 1969; 
Rosenberg & Simmons, 1971; St. John, 
1975). 

The evidence collected here certainly 
differs from the data reported recently in one 
of the major investigations of school deseg- 
regation in the United States which revealed 
increased ethnic alienation on several so- 
cial-psychological variables after years of 
contact in school settings (Gerard, Jackson, 
& Conolley, 1975). What can account for 
these differences between findings reported 
by Gerard et al. and those presented here? 
One source of the differences could be that 
the individual classrooms studied in the 
present sample exhibited a more limited 
range of academic achievement than that 
found in the general Israeli population. The 
standard deviation of the achievement scores 
of the entire population of eighth-grade 
students in Israel, on the same standardized 
test employed in the present study, was 
12.18, while the mean standard deviation on 
this test of the classrooms which participated 
in this study was 8.12. This latter range of 
achievement scores could be more limited 
than that of the typical classroom studied by 
Gerard and Miller (1975). Relatively homo- 
geneous academic status prevailing within 
classrooms could impose limits on cross- 
ethnic individious comparisons, as well as 
approximating conditions of equal-status 
contact more so than in academically het- 
erogeneous classrooms. Many investigators 
have noted that equal-status conditions for 
ethnic contact are more conducive to ethnic 
integration (Allport, 1954; Amir, 1976). By 
contrast, the desegregated junior high 
schools in Israel, with wide academic heter- 
ogeneity, appear to foster greater interethnic 
strain than the schools studied in the present 
investigation (Amir, Rich, & Ben-Ari, Note 
2). The significance of academic status even 
within the relatively homogeneous class- 
rooms which participated in this study is 
evident in the differential changes in the 
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ethnic perceptions of students of different 
academic status (see Table 3). 

A set of subject variables characterizing 
the sample in this study was analyzed in an 
attempt to identify some of the sources of the 
changes in ethnic attitudes and preferences 
reported above. Several positive findings of 
interest emerged, but the absence of findings 
was more prominent than their presence. 
The proportion of ethnic representation in 
the classroom was not found to be related to 
change in attitudes and preferences. An ex- 
amination of all levels of ethnic desegrega- 
tion, from no representation of Western 
students all the way to 90% representation 
of the Western group, uncovered no nu- 
merical combination of the two ethnic 
groups which exerted a differential effect on 
the change in ethnic attitudes and prefer- 
ences. Moreover, the extent to which the two 
ethnic groups were represented in the com- 
position of the eighth-grade classrooms in 
elementary school also had no effect on the 
degree to which the students changed their 
ethnic attitudes when they were in the ninth 
grade in high school. These findings suggest 
that ethnic mixing per se does not constitute 
a sufficient condition for positive attitudinal 
and behavioral change in ethnic relations in 
Israeli high schools. This conclusion con- 
tradicts claims by investigators that there 
are certain optimal proportions of ethnic 
representation in classrooms which facilitate 
ethnic integration (Blalock, 1967; Coleman 
et al., 1966; Pettigrew & Pajonas, Note 1). 
True, recommendations for specific per- 
centages of ethnic mix are based on findings 
about the academic achievement of minor- 
ity-group students in desegregated schools, 
while results here derive from attitudinal 
measures. But there is clearly no support m 
the data obtained in this study with social- 
psychological variables for a policy of ethnic 
desegregation in schools based on predeter- 
Ped proportions of ethnic representa- 

ion. 

Regarding attitudinal change as a function 
of sex and academic achievement, boys from 
both ethnic groups changed more during 
their year in a desegregated setting than did 
the girls. This is consistent. with research 
reports in the United States in which de- 
segregation was found to be more socially 
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facilitating for boys than for girls (St. John, 
1975). Also, students of high academic 
achievement evidenced greater attitudinal 
change than did those of low academic 
achievement in both ethnic groups. High- 
achieving Middle-Eastern students even 
manifested a decline in their stereotypical 
perception of their Western peers, while 
high-achieving Western students perceived 
their own group's characteristics as being less 
outstanding than they had thought pre- 
viously. Apparently, minority-group stu- 
dents of lesser academic ability have less 
reason to change their attitudes toward the 
majority group, either because they confirm 
their relatively disadvantaged status in the 
desegregated classroom or because they lack 
any chance for social acceptance and mo- 
bility. 

Despite these cognitive changes in the 
ethnic attitudes of the high-level Western 
group, there were no comparable changes in 
the more behavioral measures (sociometric 
questions and willingness to participate in 
social activities). High-achieving Western 
students may have been less impressed with 
their own group's superiority, by comparison 
with the Middle-Eastern group, at the con- 
clusion of the year, but they were not more 
willing to affiliate with the Middle-Eastern 
students than were the low-achieving 
Western students. Sociometric status also 
influenced change in ethnic attitudes and 
preferences to some degree: Popular stu- 
dents changed their attitudes somewhat 
more positively than did less popular stu- 
dents. Nevertheless, differentiation in levels 
of academic and sociometric status cannot 
account for a significant portion of the 
changes which occurred in the ethnic atti- 
tudes and preferences of Western and Mid- 
dle-Eastern students over the course of the 
year (see Table 1). 

The data available in this study also do 
not account for the absence of changes in 
some variables as opposed to others. For 
example, why did Western students choose 
members of their own ethnic group less fre- 
quently as sociometric friends at the end of 
the year but fail to express change on cog- 
nitive-evaluative measures? Furthermore, 
the question remains, if ethnic contact per 
se does not exert a major influence on ethnic 


136 


attitudes, what does account for the changes 
observed? While no satisfactory reply is 
available at this time, some potentially 
fruitful areas of research are apparent. This 
study did not gather information on the 
process of ethnic contact in the classroom or 
elsewhere in the desegregated school. The 
nature of ethnic cortact in school could 
certainly be one source of influence on the 
attitudinal and behavioral consequences of 
desegregation. Casual or intimate encounters 
between students from varying ethnic 
backgrounds may be one variable in the 
process of ethnic contact worthy of closer 
examination. Also, the extent to which dif- 
ferent ethnic groups participate in and exert 
influence on school activities constitutes 
another set of potentially important vari- 
ables which should receive greater attention 
from social scientists (Amir, 1976; Cohen & 
Roper, 1972; Cohen & Sharan, Note 3). 
Clearly, desegregation must be accompanied 
by programs specifically designed to foster 
ethnic integration if school desegregation is 
to make a worthwhile contribution to ethnic 
relations (Cook, 1963). 
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Subsumption Versus Educational Set: Implications for 
Sequencing of Instructional Materials 


Claudia E. McDade 
Jacksonville State University 


This study compared the implications of educational set and subsumption for 


sequencing subject-matter presentations. 


graduates were taught in alternating 


Educational psychology under- 


instructional sequences. Sequence CF 


was a conceptually oriented lecture followed by a factually oriented self-study; 


Sequence FC, a factual self-study 


student was taught in each sequence twice. 


followed by a conceptual lecture. Every 


‘An achievement test followed 


each sequence. Subsumption predicts superior performance in Sequence CF 
without interaction between instructional sequence and educational set. The 
educational set literature predicts an interaction between sequence and set, in 
which conceptually set students would perform better in Sequence CF and 


factually set students in Sequence FC 


derived from the educational set posi 


tion. 


A current issue in applied human learn- 
ing concerns the optimal sequencing of ed- 
ucational materials to enhance student 
learning and retention. Two separate lines 
of evidence reveal conflicting predictions. 
Ausubel’s (1963, 1967, 1968) cognitive 
structure theory of school learning empha- 
sizes the principle of subsumption, the pre- 
sumed nervous system process of knowledge 
organization. According to this theory, 
students learn and retain most effectively if 
they are taught general, all-inclusive con- 
cepts first, which then act as anchors for 
subsequent details, illustrations, and ex- 
amples. However, different implications for 
Sequencing follow from the construct of ed- 
ucational set (Siegel & Siegel, 1965, 1966, 
1967), a cognitive style with which a learner 
approaches new instructional material based 
0n his past encounters with such material. 
Educational set is defined as a continuum 
ranging from preference to learn factually 
oriented material to preference to learn 


This article is based on research submitted in partial 
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Results were closer to the predictions 
ition than from the subsumption posi- 


conceptually oriented material. It is mea- 
sured by a forced-choice, objectively scored 
group inventory, the Educational Set Scale 
(BSS; Siegel & Siegel, 1965). A factually set 
learner prefers factual content for its own 
sake and is not motivated to interrelate the 
facts into a more complex framework. A 
conceptually set learner accepts facts as el- 
ements to be interrelated into a broader 
contextual whole, to learn principles, con- 
cepts, theories, and relationships. The ed- 
ucational set literature predicts that con- 
ceptually set students will learn and retain 
more information when taught in a sub- 
sumptive sequence, from the most inclusive 
concepts to the most specific details. Con- 
versely, factually set students will perform 
better when taught in the opposite sequence, 
from the specifics toward principles and 
concepts. 

Thus, two contrasting predictions con- 
cerning instructional sequencing follow 
logically from the positions taken by the two 
instructional theories. Ausubel emphasizes 
the structure of material to be learned; the 
Siegels emphasize the moderating effects of 
learner characteristics. While Ausubel 
predicts that subsumptive sequencing will 
be efficacious for all learners, the Siegels 
predict. that such sequencing will benefit 
only learners whose educational sets are 
congruent with subsumptive sequencing. 
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Method 


Subjects 


Students in an educational psychology evening sec- 
tion, taught at Louisiana State University in the spring 
semester 1974, served as subjects. When the study 
began, 90 students were enrolled in the course. 

Initially, sex, American College Test (ACT) scores, 
and ESS scores were recorded for each student. The 
ACT provides a measure of ability; ESS, a measure of 
cognitive style. A distribution of students according 
to their educational set was constructed; it was tri- 
chotomized by arbitrarily designating subjects scoring 
within the lower and upper thirds of the ESS distribu- 
tion as factually and conceptually set, respectively. The 
middle third of the students was dropped, so that only 
factually and conceptually set students were used in the 

experiment. 


Procedure 


All students were divided randomly into two in- 
structional sections, with the restriction that each sec- 
tion contained approximately equal numbers of fact- 
ually and conceptually set male and female students. 
The students were informed that they were partici- 
pating in research. 

A night class is composed of a heterogeneous popu- 
lation with a higher proportion of older students than 
‘is typical in undergraduate classes. Since Ausubel’s 
and the Siegels’ theories were based mainly on adoles- 
cents, there exists little evidence as to whether their 
conclusions hold for older adults. It was decided to 
exclude students over 30:years of age from the pool of 
critical subjects. 

Recognizing the difficulty of isolating factual and 
conceptual content in psychology, portions of the edu- 
cational psychology course were structured deliberately 
to investigate the conflicting predictions from Ausubel 
and the Siegels. One instructional sequence (i.e., Se- 
quence CF) consisted of a conceptually oriented lecture 
followed by a factually oriented self-study. Students 
in Sequence CF were taught from inclusive, general 
concepts toward specific details according to the sub- 
sumption principle. The other instructional sequence 
(i.e., Sequence FC) consisted of a factually oriented 
self-study followed by a conceptual lecture in which 
students were taught from specifics toward more gen- 
eral concepts. 

The lecture format was used for the conceptual, in- 
tegrative function of each sequence because it was 
considered a more appropriate mode for such a function. 
The presentation of factual material was judged more 
appropriate to the self-study format. Both formats 
were necessary to allow presentation of all materials 
within the time allowed. The design thus confounds 
conceptually oriented instruction with the lecture ap- 
proach and factually oriented instruction with the 
self-study approach. 

$ Although the content in each sequence was highly 
similar, the varying orientations necessitated different 
procedures and objectives for each sequence. Table 1 
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outlines the principles that regulated each sequence. A 
committee of judges (N = 4) evaluated the application 
of these principles to the sequence outlines for each 
topic before it was presented to the students. 

One instructor delivered the lecture portion of both 
sequences. A proctor assisted in the self-study sessions. 
Self-study consisted of reading summaries of research 
evidence supporting generalizations made in the ac- 
companying lecture. The sequence of study of these 
summaries corresponded to their appearance in the 
lecture. Sets of questions following each summary 
further differentiated the two self-study sessions, as 
outlined in Table 1. 

Each evening session lasted 3 hours. Each half of the 
sequence took approximately 70 minutes, with ap- 
proximately 10 minutes spent in changing classrooms. 
The remainder of the period was used to test students 
for subject-matter retention. 

The procedure was implemented on four separate 
nights in a design diagrammed below: 


Ses- Sec- Se 
sion Topic tion quence 
1 Teacher characteristics 1 CF 

2 FC 

2 Creativity 1 FC 
CF 

3 Theories of learning applied 1 CF 
to the classroom 2 FC 

4 Discovery learning 1 FC 
2 CF 


An examination administered to both sections on 
each night provided data to determine if one sequence 
was more effective in fostering overall performance. 
Each consisted of 20 multiple-choice questions, ap- 
proximately half of which were judged by a committee 
to be factual in content. These questions tested in- 
formation directly given by the lecture or self-study. 
The remaining examination questions were judged to 
be conceptual in content, covering material not directly 
stated in the lecture or self-study. The committee was 
composed of three members of the Department of 
Psychology faculty and the author; reliability of judg- 
ments was r = .84. 

It was necessary to take the dependent measure im- 
mediately following the manipulation (rather than 
subsequently) to eliminate potentially contaminating 
effects of study habits, note exchanges among students, 
and so on. Students were told that the dependent 
measure contributed to their final grade in the 
course. 

In this manner, four separate dependent measures 
were taken; the last two sessions acted as a replication 
for the first two. Four separate graduate students 
lectured for each of the four sessions. The topics chosen 
represented a broad range of difficulty, from material 
easily related to students’ existing cognitive categories 
in the session on teacher characteristics to material less 
easily subsumed into existing structures in the sessions 
on learning. 
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Table 1 
^ Criteria for Instructional Sequences 


Sequence CF 
I. Conceptual lecture 
A. Advance organizer 
1. Present appropriate introductory material 
at a high level of abstraction, generality, 
and inclusiveness. 
2. Begin with most general organizer. 
3. Supplement lecture with a hierarchical 
series of organizers in descending order of 
E inclusiveness. 
B. Presentation of content 
1. Begin with concepts, propositions, 
principles with widest explanatory power, 
] generality, inclusiveness, and integrative 
power. 
2. Work toward instances, examples, 
empirical evidence, and illustrations. 
3. Present definition of each key word as it is 
discussed. 
4. Mention use of self-study cases as 
3 illustrative of principles involved. 
II. Factual self-study 
A. Presentation of content 
1. Sequence summaries from most conceptual 
and general to most factual and 
detailed. 
| B. Questions to accompany self-study 
1. Sequence questions from most conceptual 
(requiring answers not directly given in 
| self-study) to most factual (requiring 
| comprehension and knowledge of 
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1. Sequence summaries from most factual and 
detailed to most general and conceptual. 
B. Questions to accompany self-study 
1. Sequence questions from factual (requiring 
comprehension and knowledge of material) 
to conceptual (requiring answers not 
directly given in self-study). 
II. Conceptual lecture 
A. Presentation of content. 
1. Incremental presentation 

a. Begin with individual studies, examples, 
and illustrations, using as many as 
possible. 

b. Work toward generalizations, concepts, 
principles, and propositions in logical 
inductive order. 

2. Use of separate topics 

a. Compartmentalize particular ideas or 
topics within lecture. 

b. Segregate topically homogeneous 
materials without reference to other 
topics in the lecture. 

c. Fully explain each topic independently 
from others in the lecture. 

3. Present definition of each key word at 
beginning of the lecture. 

B. Summary 

1. Present principles, concepts, and 
propositions that follow inductively from 
the specifics taught. 

2, Integrate previous information into 
concepts and principles. 

3. Rely on repetition, condensation, selective 
emphasis on central concepts of lecture, 
and self-study. 


| A. Presentation of content 


material). 
nnl Sequence FC 
I. Factual self-study 
| Statistical Procedure 


Multiple regression analyses were performed for each 
dependent variable, with the regression sum of squares 
| partitioned into sequence, ACT, educational set, and 
sex, Four such analyses were required to analyze sep- 


arately the results from each of the four instructional 
sessions, since four separate dependent variables were 


involved. 


Results 


The four multiple regression analyses, one 
for each session, are summarized in Table 2.1 
Since the pattern of means as à function of 
Sequence (i.e., CF or FC) and as a function 
of Sequence x ESS interaction is of partic- 
ular interest, these means are given for all 

F four analyses in Table 3. 


Discussion 


If Ausubel’s principle of subsumption 
actually parallels nervous system function, 
students taught with material sequenced 
accordingly (i.e., Sequence CF) should have 
performed better on the examinations than 
those students taught in the opposite se- 
quence (i.e., Sequence FC) The principle 
of subsumption does not, by itself, anticipate 
asignificant interaction between educational 
set and sequence, since Ausubel does not 
regard learner variables as moderators of 
sequence effects. 


Taken together, the four dependent 


(ees 
1 Complete analyses for the four dependent measures 
are available from the author. 
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Multiple Regression Analysis Summaries : 


Table 2 
Session 1 
Source df MS F 
Regression 8 44 — 464** 
Sequence (A) f 203701224 
Educational Set Scale (B) 1 186 2.05 
Sex (C) 1 ‘2.29 254 
AXB 1 1267 14.01** 
AXC 1 37 AL 
BxC 1 .06 07 
AXBXC 1 09 10 
Error 32 .90 
* p <.05. 
** p «01. 


measures lend support to the predictions 
from the educational set literature. Sig- 
nificant interactions between educational set 
and sequence were seen in Sessions 1 and 2 
with definite, but nonsignificant, trends in 
the predicted direction in Sessions 3 and 4. 

Thus, it appears that conceptually set 
students did perform better when they were 
in Sequence CF; factually set students, when 
they were in Sequence FC. More extensive 
research is necessary to investigate these 
interactions further. 

The failure to obtain statistical signifi- 
cance in the latter two sessions may have 
reflected one or more limitations of the ex- 
perimental design. First is the possibility 
that certain topics in psychology are not 
amenable to sequencing in clear-cut induc- 
tive or deductive fashion, despite the ex- 
perimenter's efforts to arrange them. 


Table 3 
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Adjusted Means (Percentage Correct), Standard Deviations, and ns for the Group and Sequence 


Session 2 Session 3 Session 4 
df MS F df MS F df MS E 
8 —.02 225* 8 -03 186 8  .02. Sq 
1 5.30 5.02 1 191 205 1 246 “99mm 
1 143 1.36 1 04 o4 -1 26288 
1 38 .36 1 44 «15 1 164 .66 
1:482 457". 1 161 1.72 1 6.05 246 
1:179. 1.70 1 2.75 295 1 355 144 — 
17322971770 1 2.75 2.95 1 58  .24 
L:7418 12 1 08 105 1 250 102 
31 1.05 29 . 93 24 246 


Briefly, the intended design may have failed: 
on Sessions 3 and 4. Second, the data may 
reflect an adaptation phenomenon or a 
confounding due to students changing from 
sequence to sequence. By Sessions 3 and 4, 
the effects of the experimental manipulation — 
may have been muted. Although adapta- 
tion is a tenable explanation for the nonsig- 
nificant findings in the latter two sessions, 
this seems unlikely in view of the rather 
large, albeit nonsignificant, interactive dif- | 
ferences in the predicted direction in Session 
4. Finally, attrition reduced cell size in the 
latter sessions, perhaps to the point where ' 
even substantial differences failed to attain - 
statistical significance. 

The results emphasize the necessity of 
considering individual differences among 
students both in designing curricula and in 
investigating instructional procedures. The 


X Educational Set Scale (ESS) Subgroups ——À 0 
Session 1 Session 2 Session 3 Session 4 
Group n M SD n M SD n M SD n M SD 
Total group 41 66.34 9.51 40 53.13 10.27 38 70.66 9.67 33 66.52 15.69 
Sequence x ESS 
conceptually set 
subgroup 
Sequence CF 12 71.34 2.02 10 58.05 231 12 7077 147 8 6935 3.20 
Sequence FC 9 6434 122 11 4307 1.78 9 7127 3.96 11 6565 2.88 
Sequence x ESS factually 
set subgroup 
Sequence CF 10 5444 223 10 5518 246 6 6487 234 9 63.00 359 
Sequence FC 10 7113 1.90 9 5474 181 11 75.50 2.56 5 79.70 3.56 


Note. Means are ex 
lecture followed by fa. 
lecture. 


pressed as a percentage of correct responses on the examinations. Sequence CF = conceptually oriented 
ctually oriented self-study; Sequence FC = factually oriented self-study followed: by conceptually oriented 
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classroom situation is multivariate, rich with 
differences among students, instructors, 
environments, and curricula, any of which 
singly or in interaction could affect educa- 
tional outcomes. The growing movement 
toward individualizing instruction can gain 
support from these results. Student learn- 
ing may be promoted by using individual 
differences in student cognitive styles when 
designing instructional strategies. 
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A Note on Class Effects in Aptitude X Treatment Interactions 


Jan-Eric Gustafsson 
Göteborgs Universitet, Molndal, Sweden 


Results pertaining to methodological aspects of an Aptitude X Treatment in- 
teraction study are presented. One treatment group in the study was given im- 
agery instructions, the other had no imagery instructions, Each treatment 
group consisted of seven fifth-grade classes. Among the aptitude variables 
there were two versions of a paired-associates learning task. Within-class anal- 
yses and analyses in which class effects were allowed to have influence were 
conducted. In the latter analyses there were several significant Aptitude X 
Treatment interactions with subscores derived from the paired-associates 
tasks, but this was not the case in the within-class analyses. The interactions 


found are interpreted as being consequences of class effects with respect to er- 


rors of measurement. 


Cronbach and Snow (1977) made a “rad- 
ical reappraisal of the ATI [Aptitude X 
Treatment interaction] model” (p. 99), as- 
serting the necessity of separating be- 
tween-class and within-class components of 
Aptitude X Treatment interaction effects. 
The rationale behind this suggestion was 
that the Aptitude X Treatment interaction 
may arise not just through individuals’ dif- 
ferential responses to treatments. Some 
processes may affect the class as a unit, and 
the pupil’s relative standing in the class may 
sometimes be of functional importance. 

Cronbach and Webb ( 1975) presented a 
reanalysis of a study by Anderson (1941). In 
the reanalysis, between-class and within- 
class components of the within-treatment 
regressions were determined. The original 
analysis yielded a strong and seemingly in- 


terpretable Aptitude X Treatment interac- * 


tion. In the reanalysis there was no interac- 
tion based on pooled within-class regres- 
sions, and an apparent interaction at the 
class level was due to an accident of sam- 
pling. 

The reanalysis of the Anderson study in- 
dicates that separation of between-class and 
within-class effects may be profitable not 
only in the study of substantive hypotheses 


Requests for reprints should be sent to Jan-Eric 
Gustafsson, Pedagogiska Institutionen, Göteborgs 
Universitet, Fack, $431 20 Molndal, Sweden. 


related to processes at the group level and at 
the individual level. The technique can also 
help track anomalies in the data. It is shown 
below that the separation of individual ef- 
fects and group effects is warranted also in 
studies in which the latter kind of effect 
cannot be assumed to be interpretable in 
substantive terms. 


Method 


An experiment was conducted to study differential 
effects of imagery instructions on pupils with different 
abilities. (For a full presentation, see Gustafsson, 1977.) 
Two treatment groups, each consisting of seven fifth- 
grade classes randomly assigned to treatments, studied 
material dealing with two monkeys. One of the treat- 
ment groups (called the I treatment) was told to gen- 
erate visual imagery, the other group (the NI treatment) 
studied the material in a regular fashion. Immediately 
after studying the material, the subjects were given a 
posttest, consisting of three types of scrambled items, 
which yielded three dependent variables for the anal- 
ysis. One item type, called simple, asked for terms or 
figures. Another type, labeled complex, asked for more 
elaborate definitions. The third, called descri, iption, 
asked for information concerning pictorial descriptions 
of the two monkeys. 

The subjects were also given a series of aptitude tests. 
The test battery included two verbal, one reasoning, and 
three spatial tests. Also included in the test battery were 
two parallel forms of a paired-associates (PA) learning 
task with pictures and words as scrambled items. This 
article concentrates on the PA test. However, for com- 
parison, results are also presented for one verbal test 
(opposites) and one spatial test (metal folding). For 
descriptions of these tests, the reader is referred to 
Gustafsson (1976, 1977). 
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CLASS EFFECTS IN APTITUDE x TREATMENT INTERACTIONS 


The PA test was modeled after an instrument de- 
scribed by Levin, Divine-Hawkins, Kerst, and Guttman 
(1974). For each form, 11 word pairs and 11 picture pairs 
were chosen. Their photographs were mounted on slides 
to be projected on a screen. The administration of the 
PA test took place, as did all the other learning and 
testing, with intact classes. The subjects were first given 
information about the task and some sample practice 
items. Then the 22-item list was presented. Each item 
in the test was shown for 4 sec, and the list was pre- 
sented twice, in two random orders. After the second 
presentation the subjects were given a list of the stim- 
ulus terms and were to supply the response terms. The 
number of word pairs (words) recalled and the number 
of picture pairs (pictures) recalled were scored sepa- 
rately. 

The two parallel forms of the PA test were presented 
on two occasions, separated by about 3 weeks. The 
subscores resulting from the first PA test are referred 
to as Pictures 1 and Words 1; those from the second PA 
test, as Pictures 2 and Words 2. The totals are referred 
to as Pictures T and Words T. On each occasion half the 
number of the other aptitude variables were also ad- 
ministered. On the second occasion the experiment 
proper was the first activity of the session. On both oc- 


Table 1 
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casions the PA test was administered as the last test. 

Since the data collection took place on two occasions, 
there was an attrition of the treatment groups. Ac- 
cording to the class lists, there were 169 and 173 pupils 
in the NI and I groups, respectively. Since only subjects 
with complete data were included, 141 and 130 subjects, 
respectively, were available for analysis. 


Procedures in the Analysis 


Separate regression analyses were computed for each 
aptitude-outcome pair. These included a quadratic 
term to allow for curvilinear regressions of the depen- 
dent variable on the aptitude variable within treat- 
ments. In one set of analyses (labeled pooled), within- 
class and between-class effects were not separated. In 
these analyses, raw scores on the dependent variables, 
and deviation scores around the grand mean on the 
aptitude variables, were used. 

For the within-class analysis (called within), deviation 
scores around class means for the dependent variables 
as well as for the aptitude variables were used. Classes 
were pooled, thereby allowing each class to influence the 
analysis according to its size. Between-class regressions 


Regression Coefficients and F Ratios for Aptitude X Treatment Interactions 


in the Pooled Analyses 


Simple Criterion complex Description 
Aptitude NI I F NI I F NI I F 
Opposites 
Linear 154  .260  546* 152 132 Em 312 220 251 
Quadratic —010 -.010 00 .003 -000 Al  —.007 -—.006 ES 
M Eum F 2.82 .52 4 
Bee 065 139  434* 046 068 .52 148. 131 d 
Quadratic 008 —003  591* 001 —.004 2.10 000 -006 12 
W. VIR F 5.90* E 4 
ords 1 
i .377 1.98 
L 86 E: E TE EO 4109 3 
Gadratic 0038 —.029 2417 .049  —.055 DES 447  —.033 A 
Bie Interaction F 1.09 6. 5 
er 166  .200 07 064 230 238 O18 3 8.65 
Quadratic 0034 —.011 68 013  .006 po - 4 Ns 
Ww. ee F .55 Site 
Diar 258 059 2.38 984 459 d am ie pus 
Quadratic —002 027 (60  —.002 002 qe : < us 
Pi Interaction F 1.19 ` 
1 
Te 227 096 121 201 01 — 166. 343 E d. 
Quadratic 014 096 9 1050  —.009 467 85 -. 4.65 
Ww. pees F .80 5 
oe 167 420 34 188 149 05, 436 20 E: 
Quadratic 005 —.003 40 ‘010  —.008 5er 010 - 85 
Pict Interaction F a} : 
ie 136 088 42 107 .099 03 -166 PE c 
Quadratic "jo dece E od Ue 0200 15009 656 030 =. T80- 
Interaction F .28 5 
3.03. The I group was told to generate visual imagery; the NI group studied 


Note. Critical values: F 95(1, 265) = 3.88; F 9s(2, 265) = 
in regular fashion. 
p = .05. 
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were not determined, with only seven classes per 
treatment. 

Significance testing presents great problems in the 
analysis of within-class and between-class effects. A 
large number of classes is needed if significant hetero- 
geneity of between-class — within-treatment regressions 
is to be found. Also, the recognition that the class is a 
unit, both in sampling and in treatment, reveals as il- 
legitimate the usual construction of error terms in what 
is here called the pooled analyses, in which the indi- 
vidual is considered the sampling unit. Despite this, 
however, the number of pupils in the classes was used 
for determining the degrees of freedom for the error 
mean square in the pooled analyses, since no other al- 
ternative was available. The same degrees of freedom 
have been used for the within analyses even though each 
class evidently consumes one degree of freedom. How- 
ever, correcting for this would here have little bearing 
on the pattern of results. 

Statistical tests and the estimation of coefficients of 
regression used a “general linear hypothesis” model, 
with treatment coded as a dummy variable and the 
Aptitude X Treatment interaction effects represented 
with cross-product terms, Computations were per- 
formed using Program BMD10V (Dixon, 1973). 


Table 2 
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Results 


Results from the pooled analyses are 
presented in Table 1. Several significant 
interaction effects were found. The opposites 
test showed a significant difference in slope 
between treatments on the simple criterion; 
with the same criterion, metal folding 
showed significant differences between 
treatments in both linear and quadratic 
terms. The PA variables showed several 
differences between treatments for the 
quadratic terms. But parallel versions of this 
test give very different results. For Words 1, 
there was on the complex criterion a highly 
significant interaction; but for Words 2, 
there were weak nonsignificant tendencies 
in the opposite direction for both linear and 
quadratic terms. 

Results from the within analyses are pre- 
sented in Table 2. The interactions found for 


Regression Coefficients and F Ratios for Aptitude X Treatment Interactions 


in the Within Analyses 


E E m 


Simple Criterion complex Tr epu - 
Aptitude ; NI T F EY i ERN NENUSI WT E. ONE 1 —— Fo 
QR DU M a re —— — 
Linear 16 348 787 3995. 1188 A2 260 232 36 
Quadratic —.005  —.006 .05 .004  —.001 1.37 —.006  —O011 .30 
Interaction F 4.02* 69 42 
Metal folding 
Linear .055 -143  6.23* .039 .067 1.40 115 158 — 1.00 
Quadratic 008 —.002 — 4.59* 001 -.004 193 000 —.009 237 
Interaction F 5.69* 1.75 1.77 
Words 1 
Linear | 276.286 90 92  .268 67 150-383: 1.69 
Quadratic —004 -015 08 019  —047 589" 011 -—016  .28 
Interaction F .04 3.05* 86 
Pictures 1 
Linear - 222 203 02 A17  .227 156 .098 — .381 2.89 
nec SE 032. —019 93 0019  .002 27 —.025  —.059 a 
raction i Š 2.1 
Words 2 49 1.30 
Linear | 205 44733 190  .189 00 177 283 58 
Quadratic =.001 020 30 -011 001 3 -008  .020 .35 
Interaction F 19 14 5 
Pictures 2 
inear — 186  .157 .06 71 147 40 .305 376 25 
Quadratic —013 015 . 48 026 — .001 .85 049  —014 160 
Interaction F 37 42 1.28 
Words T 
Linear 166 136 17 ETRE vv STI .18 233 158 
Quadratic —000. .00 (07 004 -.007 192  —001 -.005  .12 
. Interaction F .09 1.00 81 
Pictures T 
Linear | 130 -120 0 100  .123 27 51  .2507 153 
Quadratic -007 Q0 2 .010 — .001 70 .009 -.014 1.78 
Interaction F 10 .63 2.21 


Note. Critical =a F s5(1, 265) = 3.88, F o5(2, 265) = 3.03. The I group was told to generate visual imagery; the NI group studied 


in regular fashi 
*p «.05. 
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CLASS EFFECTS IN APTITUDE x TREATMENT INTERACTIONS 


. both opposites and metal folding reappeared 
essentially unaltered in these analyses. But 
almost all the interactions with PA failed to 
reappear in the within analyses. The only 
exception was Words 1 on the complex cri- 
terion, and this interaction was considerably 
weaker in the within analysis. 


à Discussion and Conclusions 


Only methodological points need be dis- 
cussed here; a full presentation of results and 
interpretations in substantive terms for 
verbal and spatial aptitudes is given in 
Gustafsson (1977). 

The results for opposites and metal fold- 
ing were essentially the same in the pooled 

_ and within analyses, but PA showed strong 
interactions in the former analyses and no 
interactions in the latter. This, along with 
the fact that there were great differences in 
results for the two PA forms, suggests that 
the pooled analysis results for PA are arti- 
facts. One explanation for this could be that 
since one of the PA forms was given after 
treatment, there were reactive effects of 
treatment on this form. However, the cred- 
ibility of this explanation is considerably 
weakened by the fact that there was no sign 
of a treatment main effect on the PA test, 
nor were there any systematic differences 
between treatments in the pattern of inter- 
correlations between the PA variables and 
the other aptitude variables. 

Another, and a more reasonable explana- 
tion, concentrates on the nature of the tests 
themselves. The PA test, which is a learning 
task, can be highly sensitive to the instruc- 
tions given, to the mental alertness of the 
subjects, and to other events during admin- 
istration, whereas the essentially unspeeded 
multiple-choice tests, opposites and me 
folding, can be considered to be more im- 
mune to influence of such factors. When a 

sensitive” test is given in classes, many of 
these factors may influence the pupils more 
or less in the same way. The effect of this 
would be to produce a large intraclass cor- 
relation for the errors of measurement. 
When this occurs for both aptitude and de- 
^ pendent variables in a study in which ci 
^" are nested within treatments, correlated 


145 


errors for aptitude and dependent variables 
may result. Since these correlated errors may 
be of different kinds in different treatments, 
a "significant" Aptitude X Treatment in- 
teraction may appear. 

Tn the within analyses, in contrast, each 
subject’s scores are deviations from the class 
means; any positive or negative fortuitous 
effect on the scores that is common to the 
class will consequently not affect the analy- 
sis. In this study, the hypothesized class ef- 
fects with respect to errors of measurement 
resulted in a complex Aptitude X Treatment 
interaction in the quadratic component. Tt. 
cannot be assumed, however, that the linear 
regressions remain free from such effects. 
Whenever a small number of classes is 
nested within treatments, class effects in the 
administration of tests and tasks can have as 
a consequence interactions of any kind. 

The present analysis thus emphasizes the 
need to keep track of class-mediated effects 
on experimental variables and aptitude 
variables. Aptitude variables (and classes), 
of course, are sensitive in differing degrees 
to such effects, but whenever the presence of 
class-mediated errors of measurement are 
suspected, within-class analyses should be 
conducted. 

Obviously, attempts should also be made 
to avoid the problem altogether. One might 
administer sensitive tests individually. This, 
however, may not be possible due to the large 
number of subjects needed for Aptitude X 
Treatment interaction studies. If it is im- 
possible, subjects from different treatments 
could be tested together in small groups. 
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Teacher Verbal and Nonverbal Behavioral Expression 
Toward Selected Pupils 
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University of Texas at Arlington 
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University of Northern Iowa 


Using the context of pupil-initiated questions, selected verbal and nonverbal 
behaviors of teachers were investigated. Twelve teachers in individualized 
classroom settings were asked to select four students subsequently labeled ac- 
cepted, concerned, indifferent, and rejected. Teacher verbal behaviors were 
recorded with the Observation Schedule and Record 5V. Teacher—pupil in- 
terpersonal distance was recorded with the kinesthetic scale of the proxemic 
notation system. Selected pupils completed a school attitude questionnaire. 
School attitudes differed significantly between pupils labeled accepted, con- 
cerned, indifferent, and rejected. No significant differences were found in 
teacher behaviors toward these pupil groups. The usefulness of the situation- 
al context approach to the investigation of teacher behaviors is discussed. 


Educational researchers investigating 
teacher-pupil verbal and nonverbal behav- 
iors have been frustrated by the absence of 
a systematic observational methodology 
(Galloway, 1974). A situational frame con- 
textual approach to the study of expressive 
cues has been suggested in disciplines out- 
side education that may have direct appli- 
cation to teacher behavioral studies within 
naturalistic classroom settings (Hall, Note 
1). Furthermore, it has been demonstrated 
that teacher behavioral studies that focus on 
the teacher’s behavior toward the entire class 
are of questionable validity (Brophy & Good, 
1974). These authors suggest that future 
studies of teacher behaviors should be con- 
cerned with the observation of teacher be- 
havior toward selected pupils. The present 
study implements the observational meth- 
odology suggested by Hall, identifies selected 
pupils, and addresses many of the questions 
raised by Galloway. Selected teacher verbal 
and nonverbal behaviors were simulta- 
neously recorded during interactions with 
selected pupils within a situational context 
labeled the pupil-initiated question. 


Appreciation is expressed to Diane Berthel and 
Steven Silvern for their assistance at selected stages of 
this project. 

Requests for reprints should be sent to Douglas M. 
Brooks, Department of Education, University of Texas, 
Arlington, Texas 76019. 
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Method 


Subjects and Context 


The University of Northern Iowa Malcolm Price 
Laboratory School was the site of the study. The lab- 
oratory school instructors are associated with the larger 
university and are characterized as well-trained and 
innovative. The mandate of the laboratory school is 
child-centered instruction. The classrooms included 
in the investigation were all characterized by individ- 
ualized instruction, pupil-determined seating, freedom 
of pupil movement within the class, and varied in- 
structional strategies. 

The student sample was drawn from a population 
that included faculty children, children from the im- 
mediate upper-class neighborhood, and a limited 
number of minority students transported from less af- 
fluent neighborhoods in the community. However, no 
minority group students appeared in the sample. With 
the exception of sex, no other pupil personal data were 
collected. 

Teacher-pupil proximity within the classes was a 
function of teacher-pupil dispositions rather than a 
function of established seating charts or secured, 
structured desk arrangements. Only secondary class- 
rooms were involved in the study, and no secondary 
grade level dominated the sampled classrooms. Stu- 
dent achievement data were not available to us. 


Instruments 


Teacher verbal behavior was recorded using the Ob- 
servational Schedule and Record (OSCAR 5V; Medley 
& Mitzel, 1963). The OSCAR 5V allows recording of 
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Table 1 
Description of the Eight-Observation Schedule and Record 5V Verbal Scales 


Scale Item 


Managing Behavior "This is an index of the relative number of events that are concerned with 
procedural matters, that is, with managing the class. A “really” considerate 
teacher would be reflected in a negative scor(? 


Rebuking Behavior "This reflects primarily how often a teacher criticizes pupil behavior. A high 
score would reflect teacher irritability. 


Permissive Behavior A high positive score on this key reflects a permissive teacher (one who lets 
pupils make decisions), A high negative score reflects an autocratic teacher 
(one who does not let pupils make decisions). 


Listening Behavior A high scoring teacher is one who “listens” to a pupil and waits to be sure the 
pupil is done talking before replying or interrupting. This high-scoring 
teacher lets a pupil who has just volunteered a comment or question make a 
second comment without interrupting him. 


Lecturing Behavior ‘This key contrasts the teacher who develops content by lecturing, from one who 
develops it by questioning pupils. A teacher who lectures (talks about content 
for long periods of time) gets a very high positive score, A teacher who 
interacts a lot with pupils gets a high negative one. 


Question Source This key contrasts classrooms where pupils initiate relatively more interchanges 
with classrooms where the teacher initiates relatively more of them. The 
highest positive scores are associated with the former classrooms; a high 
negative score with the latter classrooms. 


Question Difficulty ‘This key seems to contrast two kinds of teachers. A high positive score identifies 
a teacher who asks many questions, mostly convergent, which appear to be 
easy, since the pupils almost always answer them correctly but are rarely 
praised (as they should be if the questions are difficult). A high negative score 
identifies a teacher whose questions elicit answers of more varied quality; some 
p praised, some are criticized, some rejected, but very few are merely 


Question Quality ‘This key contrasts two kinds of teachers. The teacher obtaining a high positive 
score is probing, questioning to develop more subtle points. ‘This teacher asks 
mainly elaborating questions and rarely evaluates a pupil response. The 
teacher obtaining a high negative score asks mainly convergent questions, 
evaluates pupil responses, and asks another question. This latter style might 

in a rapid-fire drill activity. 


18 behavioral categories that can be factored into the proximity during an interaction, that of actual body 
following eight orthogonal categories; managing, re- contact, to the proximity that is outside reaching dis- 
ing, too fond 10 eee m]). ‘The symbols recognize 
use of arms, el , and knees to either increase or 
quality. A factor analysis procedure developed by decrease perceived distance in a dyadic interaction. 
Medley (Note 2) was used to convert raw data into "The instrument selected as a measure of pupil atti- 
ndi us b Tuis. 2 tases towerd school wus the Describe Your School Is 
specific instrument in collection of ventory (DYS). The DYS was developed to study the 
nonverbal teacher classroom behavior was the kines- ee. of the Minnesota Teacher Attitude bel 
|,  (MTAI, Cook, Leeds, & Callis, 1951). The DYS takes 
1963). The PNS was developed as a method of sys- — the form of a 50-item questionnaire and consists of 
tematically recording eight dimensions of nonverbal simple questions to be answered by underlining yes or 
eur The kinesthetic scale of the PNS provides no. A high score reflects positive attitudes toward 
spatial symbols that represent observable spatial ^ school. Items are in the form of questions to minimize 
closest acquiescence set. Many of the items reflect MTA 
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statements. The DYS shares the MTAI theoretical bias 
for classroom democracy and teacher-pupil rapport. 


Observer Training 


Volunteer undergraduate secondary education majors 
were trained in the use of the kinesthetic scale of the 
PNS and the OSCAR 5V. As a part of the training, 
classroom simulations were conducted, and observer 
accuracy was assessed by percentage agreement with a 
scoring key provided by us. Following the simulation, 
observers watched videotapes of actual classroom in- 
teractions. Observers orally identified the appropriate 
verbal behaviors. When the general average across 
scales of each observer's ratings attained 90% accuracy, 
observers were assigned to classrooms involved in the 
study. Observer accuracy during training was deter- 
mined by comparison of trials with OSCAR 5V encoding 
forms that we had completed. We coded the training 
tapes in advance of the training sessions. 

Incidents of “elaborating 1” and “elaborating 2” on 
the OSCAR 5V encoding form were the most difficult. 
to master. These encoding dimensions reflect occasions 
during which the teacher actively listens without in- 
terruption, either once (E1) or twice (E2). ‘There is a 
tendency to record the beginning of another separate 
interchange as pupil statement rather than to attribute 
the statement to teacher listening, which is E1 or E2. 

A similar training procedure was used to familiarize 
observers with the kinesthetic scale of hee fig 
authors modeled the 10 interpersonal noes in so- 
quence from closest to most distant and then random- 
ized the distances modeled. Observers were then asked 
to view videotapes of classrooms and record teacher- 
pupil proxemic behavior during pupil-initiated ques- 
tions, Observer accuracy was determined by compar- 
ison with PNS encoding forms that we had com 
Ninety percent accuracy was regarded as mastery. 


Procedures and Encoding 


‘The participating 12 teachers were asked to nominate 
one studiaii horn one class into each of the four cate: 
gories labeled accepted, concerned, indifferent, and 
rejected, 

The nomination of pupils to ca 
plished by having the teachers respond 

ted by Silberman (1 
“name one student you would like 
in for the sheer joy of it.” The 
E to this question was the H 
. The teacher was then asked to "name one stu 
dent you would spend more time with if you could. 
This student was the student. The pahe 
= then asked to name —G who you werk 
t prepared to talk al ata 
ference.” This student was the indifferent student- 
Finally, the teacher was asked, “If you could reduce yout 
oS by one child, who Wis itbe? This student 
was the rejected student. 
‘Two teachers refused to nominate a student to the 


Table 2 
Distribution of Boys and Girls in the Four 


Attitude Groups 


Group n LÀ 
Accepted 
Boys 6 136 
Girls L] 136 
Concerned 
Boys 7 159 
Girls * 90 
Indifferent 
Boys 6 136 
Girls 5 na 
Rejected 
Boys 9 704 
Girls 1 20 


classroom. 
a fixed 30-minute time peri (5 minutes after 
the start of class), tbe observers were told to record the 


teacher pupil verbal and 


following 
only to attend to the four students whose names they 
(b) The observers were told to record, in four 


of the OSCAR 5V encoding form, 
students ( rant pet ) Hf no ver! 
fous studante ed Aan a eneciedvoudent and the 
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Table 3 y 
Cell Totals, Means, and Standard Deviations for Personal Space (in feet) by Groups 
Cell Range 
Group total M* SD Maximum Minimum 

Accepted 43 2.96 (.88) 2.88 (.85) 10 10.00 (3.04) 
Concerned 34 1.95 (.57) 2.16 (.64) 10 10.00 (3.04) 
Indifferent 37 2.67 (.79) 3.01 (.91) 10 11.00 (3.34) 
Rejected 50 4.34 (1.31) 6.97 (2.09) .10 46.00 (13.80) 


Note. Numbers in parentheses are the metric equivalents in meters. 
1000 is the equivalent of actual physical contact M = 3.1098 (.93 m). 


teacher, no OSCAR 5V encoding took place. (c) If 
during a verbal exchange between a selected pupil and 
the teacher the pupil initiated a question (pupil-ini- 
tiated question is an encoding category on the OSCAR 
5V), the student observer was instructed to circle that 
symbol on the kinesthetic scale of the PNS that most 
accurately represented the observer's estimate of the 
physical distance between the teacher and the pupil 
while the teacher was responding tothe pupil's question. 
The situational frame within which the teacher-pupil 
interpersonal spatial behavior was recorded began when 
the pupil initiated a question to the teacher and ended 
when the teachers completed their response. Four 
kinesthetic scale encoding forms were stapled on the 
reverse side of the OSCAR 5V encoding form to facili- 
tate simultaneous recording. What appeared to the 
teacher to be note taking on the part of the student 
observer was in fact systematic teacher behavioral ob- 
servation. Three 30-minute observations were made, 
providing a total observation time of 90 minutes per 
class. 


Decoding and Data Analysis 


The interchanges recorded on the OSCAR 5V were 
transposed into two-digit numbers. For example, a 
"pupil-initiated question (2) that was followed by a 
teacher desisting remark (7) was recorded as an inter- 
change. This interchange was decoded as the two-digit 
‘number 27. The total number of two-digit interchanges 
in a single observation w & placed on one IBM card. 
One card existed for each observation period. Acom- 
puter program provided by Medley (Note 2) was used 
to factor each individual series of two-digit numbers 
(one observation). This program reduces each series 
of two-digit interchanges into the eight factors reported 
in Table 1. A multivariate analysis of variance proce- 
dure was performed on the eight verbal factors across 
the four pupil categories. 

The kinesthetic data as recorded on the PNS was 
decoded by transforming the circled symbol for each 
observation to the actual physical distance that the 
symbol represented. Since the incidents of recording 
teacher-pupil proximity were conditional on the oc- 
currence of a pupil-initiated question, the different cell 
totals in Table 3 are the result of observations in which 
a selected pupil did not become verbally engaged with 


the teacher. All incidents of verbal behavior were re- ^ 


corded, but a pupil-initiated question was required 
before teacher-pupil interpersonal proximity was re- 


corded. An analysis of variance procedure was per- 
formed on the kinesthetic data across the four pupil 
categories and also on the DYS across the four pupil 
categories. 

We considered a two-way analysis with pupil category 
and teachers entered as main effects and a cell size of 
one. However, this design was not used for the fol- 
lowing reasons: If significant differences between 
teachers were found or a significant (Teacher X Cate- 
gory) interaction was found, any explanation of these 
findings would be speculative because of the conditions 
of data collection. Data are not available to suggest by 
what criteria (i.e., classroom behavior, achievement, 
etc.) students in each class were nominated to the 
categories. (One teacher's accepted student for be- 
havioral reasons may be another teacher's accepted 
student for achievement reasons.) The authors would 
rather err on the conservative side (variability in the 
within-error term) than in the direction of findings (a 
significant interaction) that could not be interpreted 
with any collected behavioral evidence beyond reported 
attitudes toward school. The criteria teachers use to 
place students in the suggested categories and an 
analysis that takes into account such data would be of 
considerable interest in future research. 


Results and Discussion 


Significant differences (p < .01) were ob- 
tained on the DYS and are reported in Table 
4. A Kuder-Richardson 20 reliability esti- 
mate of .84 is reported for the DYS. 

Mean scores for DYS within the selected 
categories are reported in Table 5. The 
difference in accepted means for these 


Table 4 j b 
Analysis of Variance for Describe Your Schoo: 


Inventory = 


Source. df MS F 
Between groups 8 156.62 4,08* 
Within groups 40 38.62 

Total 43 * 


*p<.01. 


groups and rejected mean DYS scores seems 
intuitively obvious and is therefore not sur- 
prising. The similarity of the concerned and 
rejected mean DYS scores suggests that 
while these pupils’ attitudes toward school 
were essentially the same, their location in 
their respective groups may be indicative of 
pupil behavioral differences during inter- 
ctions with the teacher. It may be that the 
oncerned pupil exhibits more acceptable 
classroom behavior, although such attitudes 
toward school may resemble a less well be- 
i haved rejected pupil. Similar reasoning was 
f developed by Willis and Brophy (1974). 
Significant differences among pupil 
groups were not demonstrated for teacher 
kinesthetic behavior (see Table 6). How- 
ever, group means and ranges of kinesthetic 
behavior do warrant discussion. Hall (1966) 
defined the close phase of “personal dis- 
tance” as ranging from 1.5 feet (.45 m) to 2.5 
feet (.76 m). The mean proxemic distance 
established by the teacher of 1.9 feet (.57 m) 
within the concerned group represents the 
inner boundary of the sphere (see Table 3). 
This suggests that teachers within our sam- 
ple were interacting at a highly affective 
personal distance and were exhibiting be- 
haviors consistent with their initial identi- 
fication of concern. The complete range of 
teacher kinesthetic behavior within the 
concerned group extends from actual phys- 
ical contact of .1 to 10.0 feet (3.04 m). The 
outer limit of the teachers’ behavior is still 
within what Hall identified as the social- 
consultative sphere. The outer limit of the 
En consultative sphere was 12 feet (3.65 
Mean teacher kinesthetic behaviors within 
t the accepted and indifferent groups were 2.9 


feet (.88 m) and 2.6 feet (.79 m), respectively. 
| Behavioral ranges within these groups were 


ole 5 
Y Totals, Means, and Standard Deviations 
r Describe Your School Inventory for All 


Cell 


L- Grup totas M 2 
Accepted 12 39.59 6.63 
Concerned 11 33.94 6.74 
Indifferent 7.88 


Rejected 
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Table 6 
Analysis of Variance for Kinesthetic Data for 


All Four Groups 


Source df MS F 
Between groups 3 42.85 2.14* 
Within groups 160 20.07 

Total 163 
*p«.10. 


equally similar: .1 to 10.0 feet (3.04 m) and 
.1 to 11.0 feet (3.34 m), respectively. These 
mean distances fall within the far phase of 
the “personal distance” sphere. The range 
of the personal phase was 2.5 feet (.76 m) to 
4 feet (1.21 m). This personal distance re- 
lationship is not as positively affective as the 
distances established by the teacher with the 
students labeled as concerned. 

Teacher kinesthetic behavior within the 
rejected category is perhaps the most re- 
flective of the subtle behavioral differences 
that teachers may exhibit. The interper- 
sonal distance of 4.3 feet (1.31 m) is within 
the close phase of Hall's “social distance" 
sphere. This sphere ranges from 3 feet (.91 
m) to 7 feet (2.12 m). According to Hall 
(1966), “impersonal business occurs at this 
distance" (p. 121). The range and standard 
deviation within this category reveal inter- 
active distances that extend beyond the so- 
cial-consultative sphere into the sphere of 
public distance of 25 feet (7.62 m). Teach- 
ing as à social-consultative activity should 
fall within the social-com: ultative sphere. 
Our data suggest that ir, faction to the re- 
jected student, social-consultative activity 
is exactly what occurs. Teacher kinesthetic 
behavior toward pupils labeled as concerned, 
indifferent, and accepted appears to beofa 
more personal quality. The child labeled as 
rejected does not appear to enjoy the non- 
verbal expression of positive affect through 
close physical proximity as does the child 
who is labeled concerned, indifferent, or 
accepted. The similarity in rejected and 
concerned students' attitudes toward school 
and the differences in teacher kinesthetic 
behavior might suggest that negative pupil 
classroom behavior patterns, maturity, ap- 
pearance, activity level, and so forth, con- 
tribute to a child's placement in the rejected 
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category rather than the concerned cate- 
gory. 

A profile of the eight verbal variables 
across the four pupil categories is reported 
in Figure 1. Significant differences were not 
demonstrated on the selected scales of 
teacher verbal behavior between categories. 
However, if the total teacher sample that is 
being asked to implement a school philoso- 
phy is thought of as the unit of analysis, the 
overview of teacher-pupil behavioral styles 
lends itself to interpretation. In a school 
philosophically committed to a “child-cen- 
tered” approach, particular teacher verbal 
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and nonverbal behavioral expressions would 
be expected. The kinesthetic data reflect a’ 
personal dimension to instruction for all but 
the pupils who are labeled as rejected. 
The verbal data seem to reflect individu- 
alized verbal interactive patterns. The low 
Lecturing scale scores indicate that the 
teachers were verbally interacting with pu- 
pils and not lecturing to them. This would 
be a logical teacher objective in a mor 
child-centered classroom environment 
(Katz, 1972, cited in Good, Biddle, & Bro- 
phy, 1975). The inactivity of the Managing, 
Rebuking, and Permissive Behavior scales 


mm-z 


- LC 


S (Question Source) 


(Managing) 
(Rebuking) 
(Listening) 


(Question Quality) 
(Question Difficulty) 


LC (Lecturing) 


E 


Figure 1. Observational schedule and Record 5V Verbal scale scores for selected pupil categories. 4 
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. suggests that teachers are “in control” by 
" classroom concensus and not as a conse- 
quence of constant verbal interaction with 
pupils. This seems to be a logical teacher 
objective in a more child-centered demo- 
cratic classroom environment. Scores on the 
Question Difficulty scale suggest that 
chers were interacting with divergent 
uestioning styles toward accepted pupils 
d more convergently toward indifferent 
upils. The Source scale scores suggest that 
pupils in the sample were asking many 
questions. The indifferent children seemed 
to initiate fewer questions, hence lower 
Source scale scores. The Source scale scores 
for rejected pupils may provide some insight 
into a pupil verbal behavior that results in 
teacher rejection, that is, too many questions 
relative to other pupils, the wrong kinds of 
questions relative to other pupils, or the 
ong kinds of questions. This may be the 
verbal pattern of the classic “kid who wants 
attention” and who annoys the teacher to the 
point of rejection both categorically and 
nonverbally. 
Nonsignificant findings in teacher be- 


havioral investigations are not unexpected. 
Brophy and Good (1974) suggest that 
teacher role expectations and consequent 
- role-appropriate behavior might result in 
similar behaviors toward most students. 
Further, they suggest that teacher manipu- 
lation of behavior might occur with the more 
easily self-monitored teacher verbal behavior 
(especially in the presence of a visitor). 
| It may be that the less easily monitored 
nonverbal behaviors, like teacher spatial 
behavior, can be recorded and take on 
l meaning if an appropriate method exists for 
Such observation. The results seem to 
$ support the continued use of Silberman’s 
- (1969) categories. Similarly, the observation 
of teacher behavior toward selected pupils 
appears to be a useful strategy. We feel that 
while the data reported are interesting, the 
observational methodology is of particular 
Significance to researchers interested in 
adding new meaning to the results of teacher 
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behavioral studies. Knowing how to act is 
important to instructional effectiveness, but 
knowing in what interpersonal context to 
exhibit a particular behavior is a higher level 
of prescription for teachers. The situational 
frame contextual approach to teacher be- 
havioral analysis seems to provide for more 
systematic “situation” behavior analysis, 
which in turn should provide for more pre- 
cise situationally prescriptive teacher be- 
havioral training. 
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Persistence and the Causal Perception of Failure: 
Modifying Cognitive Attributions 


Gregory R. Andrews and Ray L. Debus 
University of Sydney, Australia 


In the initial phase of two complementary studies of the relation of persistence 
behavior to the causal perception of failure, temporal persistence and resis- 
tance to extinction were found to be positively related to the attribution of 
failure to insufficient effort and negatively related to attributions to ability 
and task difficulty by both male and female sixth graders. In Phase 2, the 
male pupils who least frequently attributed failure to lack of effort were ran- 
domly allocated to a control group or a social reinforcement group or a token 
plus social reinforcement attribution retraining group. Atimmediate and de- 
layed posttests, experimental subjects attributed success and failure on the 
training task and two independent transfer tasks to effort significantly more 
than did controls. A significant increase from pretest levels on both persis- 
tence indexes paralleled the attributional change of experimental subjects. 
No difference was evident in the effectiveness of the two experimental treat- 
ments. Despite some attenuation on the transfer tasks, there was evidence of 
durability of training effects, and generalization of effects to an independent 
tester at a further 4-month follow-up posttest. The results provided strong 
support for the attribution model of achievement motivation and provide an 
empirical foundation for the rationale of attribution retraining programs. 


Major new insights into persistence be- 


: (although somewhat unconsciously) between: 
havior and other achievement-related be- 


stable and variable properties of causal at; 


haviors have resulted in recent years from 
the application of concepts of attribution 
theory in the development of a cognitive 
(attribution) model of achievement moti- 
vation (Weiner, Frieze, Kukla, Reed, Rest, 
& Rosenbaum, 1971). The key to this model 
is the assumption that causal beliefs about 
success and failure experiences have im- 
portant consequences for subsequent feel- 
ings, expectancies, and behavior. Heider 
(1958) postulated that in attributing the 
causality of events, individuals distinguish 
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tributes as well as between internal and ex- 
ternal properties. These loci of control and 
stability dimensions of the four causal fac- 
tors (ability, effort, task difficulty, and luck); 
suggested as being the most general and sa- 
lient of the causes of achievement outcomes: 
(Weiner, 1974), have been found to be most. 
important in understanding the affective 
reactions to success and failure as well as the 
changes in perceived probability of success 
for future outcomes. j 
As a major index of motivation, persis- 
tence behavior has been widely used k 
support the general hypotheses of Atkinson's 
model of achievement motivation (Feather, 
1961, 1962, 1963). Persistence in achieve- 
ment-related contexts also becomes the key 
focus of the attribution model of achieve- 
ment motivation, being viewed as a function 
of the tendency to attribute failure to lack of 
effort. The attribution model postulates 
that it is the perceived stability of a cause 
that is the critical determinant of the ex- 
pectancy of success. If failure at an 
achievement task is believed to be caused by: 
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stable factors such as a low level of ability or 
task difficulty, failure on future trials of the 
game or similar nature will be anticipated 
since consistencies between past and future 
behaviors are expected (hence decrements 
occur in subsequent expectancy of success). 
If causality is attributed, however, to the 
variable factors (effort and/or luck), expec- 
tancy of success tends to remain constant or 
to increase since there is no expected con- 
sistency between present and future out- 
come. 

Direct evidence in support of the attri- 
butional interpretation is hitherto relatively 
meager, however, though considerable in- 
direct support has been derived by inference 
from the post hoc analysis of studies inves- 
tigating related phenomena (Weiner et al., 
1971). The first concern of this study then 
was to examine directly the relation between 
persistence at an achievement task and in- 
dividual differences in cognitive attribu- 
tional predispositions. 

The second phase of the study investi- 
gated the possibility of changing these cog- 
nitive attributional predispositions. The 
primarily descriptive research to date has 
demonstrated the relation of specific attri- 
butional schemata to approach and avoid- 
ance behavior, risk preference, and intensity 
of performance as well as to persistence in 
achievement contexts (Weiner et al., 1971). 
To the extent that such behaviors are held 
to manifestly influence the degree of learning 
in academic settings, it would appear that 
the attribution process may be a significant 
determinant of learning and performance m 
the classroom. 

The pervasiveness of the attribution 
variable having been reasonably established, 
there have been several suggestions that at- 
tempts should be made to devise and test 
programs that are designed to induce ap- 
propriate, achievement-enhancing attribu- 
tions in children who typically give Up in the 
face of failure and display self-defeating at- 
tributional schemata (Dweck & Repuceh 
1973; Weiner, 1972). The present investi- 
gation reports such an attempt. 

Support for such programs has come from 
recent studies in clinical psychology (Ross, 
Rodin, & Zimbardo, 1969; Storms & Nisbett, 
1970), which have discussed the possibility 
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of “attribution therapy” in the attenuation 
of debilitating behavior symptoms and their 
effects, by teaching new attributions for 
these symptoms, and from studies of several 
existing clinical techniques such as “cogni- 
tive restructuring,” "rational-emotive 
therapy,” and modification of “what clients 
say to themselves” (Davison, 1966; Ellis, 
1962; Meichenbaum & Cameron, 1974), 

which, in essence, involve the reattribution 

of events. The “therapy” in the present 
study involved using reinforcement proce- 

dures to train subjects to attribute their 

success and failure experiences to effort, for 

the literature on the consequences of causal 

ascription of success and failure at achieve- 

ment-related tasks highlights effort-oriented 

schemata as the attributional key to moti- 

vation. 

The efficacy of this approach is evident in 
the study by Dweck (1975), She asked 
whether altering attributions for failure 
would enable “learned helpless” children to 
deal more effectively with failure in an 
experimental problem-solving situation. 
Subjects were taught to take responsibility 
for failure and to attribute failure to insuf- 
ficient effort. By the end of 25 daily training 
sessions with mathematical problems, 
subjects in Dweck's attribution retraining 
group, who formerly showed marked per- 
formance deterioration when they failed, 
showed either negligible impairment after 
failure or actual improvement. A significant 
change in the pre- to posttraining scores on 
an Effort vs. Ability Failure Attribution 
Scale indicated that these subjects had al- 
tered their attributions for failure in situa- 
tions involving mathematics in general, in 
addition to changing their reactions to fail- 
ure in the experimental situation. 

‘Asan initial test of, and model for, an at- 
tribution retraining procedure, the Dweck 
study provides encouraging results. At 
several major points, however, Dweck's 
conclusions and interpretation depend on 
inference rather than on actual data. The 
initial attributional schemata of Dweck’s 
‘subjects were assumed, and this assumption 
was only partially verified by their profiles 
on the Intellectual Achievement Responsi- 
bility (IAR) Scale (Crandall, Katkovsky, & 
Crandall, 1965). Although many of the 
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studies contributing to the attribution model 
of achievement behavior have utilized the 
IAR as their attribution measure (Dweck & 
Repucci, 1973; Weiner, Heckhausen, Meyer, 


& Cook, 1972; Weiner & Kukla, 1970; Weiner: 


& Potepan, 1970), there has as yet been no 
simple direct test to show how far the IAR is 
predictive of the causal schemata people 
employ in actual behavior in specific situa- 
tions. 

Similarly, Dweck assumes that the train- 
ing procedure did change the subjects’ causal 
attributions for failure and that that was 
responsible for their improved performance 
in the face of failure. Confirmation of this 
assumed change gained some support from 
the significant increase in the choice of effort. 
alternatives from pre- to posttraining on the 
Effort vs. Ability Failure Attribution Scale 
whose items were specific to situations in- 
volving mathematics, the subject of the 
training procedure, though the assumed 
changes did not generalize to the wider range 
of items included in the IAR. 

As a test and model of an attribution re- 
training procedure, the present investigation 
may avoid such assumptions with its more 
direct situational assessment and monitoring 
of subjects' attributional schemata on in- 
dependent behavioral tasks. In addition, 
the experimental phase of the present study 
presents a more direct test of the core hy- 

pothesis of the attribution model of 
achievement motivation: If causal infer- 
ences influence and in part determine sub- 
sequent achievement behavior, then 
changing an individual’s attributional re- 
Sponse set results in changes in the individ- 
ual’s subsequent achievement behavior. 


Phase 1 
Method 


Subjects and procedure outline. The initial sample 
for the study consisted of 71 female and 87 male chil- 
dren, the entire sixth-grade enrollment of two urban 
elementary schools in Sydney, Australia. Mean ages 
were 11 years 8 months for girls and 11 years 11 months 
for boys. 

Paper-and-pencil-attribution measures were first 
administered to the intact. classroom groups by the first. 
author in the absence of class teachers. Individual 
subjects were later called to an office used as the testing 
room where they made attributional judgments in re- 
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sponse to a contrived sequence of success and failure 
experiences at an achievement task (circle design task) 
and were tested for temporal persistence and resistance 
to extinction on an independent achievement task 
(Perceptual Reasoning Test). Subjects were then 
thanked for their cooperation and returned to class, 

Attribution measures. The IAR scale has figured as 
an attribution measure in a number of studies (Dweck, 
1975; Dweck & Repucci, 1973; Weiner et al., 1972; 
Weiner & Kukla, 1970; Weiner & Potepan, 1970). In 
these investigations, intellectual achievement respon- 
sibility refers to the readiness to attribute successful or 
failed actions to oneself (ability or effort expenditure) 
rather than to external sources, that is, the extent to 
which one perceives oneself as a causal agent in behav- 
ior-outcome sequences. Separate subscores were de- 
rived for groups of items that categorize the internal 
alternatives on this forced-choice 34-item scale into 
those that attribute the outcome to the ability of the 
subject and those that attribute the outcome to the 
subject’s motivation and effort. By using this addi- 
tional scoring distinction, subscale scores of I + A 
(ability), I + E (effort), I — A (ability) and I — E (effort) 
were generated for each subject in addition to the usual 
I+ and I- scores. 

As the IAR scale calls only for a forced choice between 
an internal and external response on each item, subjects 
are not required to differentiate between possible in- 
ternal alternatives of attributing outcomes to effort 
factors or to ability factors, which makes it difficult to 
assess the relative emphasis that an individual places 
on these two factors. The IAR items were therefore 
further supplemented by items of the Effort Attribution 
Scale (EAS), which was devised as a measure of the 
tendency to attribute both success and failure to effort. 
Patterned on the general format of the IAR and 
employing similar achievement-related situations, the 
EAS scale consists of five items referring to failure sit- 
uations and five referring to success situations. Each 


item presents two stems, one attributing the causal 


outcome to an effort variable, the other to an ability 
variable. Effort responses are scored to produce sub- 
scales E+ and E-. The following are two typical items 
illustrating failure and success outcomes and ability and 
effort internal stems: “When you win at a game, isit 
usually (a) because you play very well, or (b) because 
you try your hardest? When you do badly on a test, is 
it usually (a) because you haven't tried to do well, or (b). 
because you don’t have the ability to do well?” 

A more direct form of attribution assessment was 
used in relation to specific behavioral tasks. The 
measure, which may be used in relation to any task, was 
seen as the major attributional measure for both phases 
of the investigation. The instrument is a simpler 
variation of that used by Nicholls (1975). Subjects 
indicate their attributions by monitoring any of four 
half-moon cardboard disks that are encased in a larger 
purse. Each half-disc can be exposed to fill from 0° to 
180° of space outside the casing. One side of the casing 
is labeled “I succeeded because" and the other side, “I 
failed because.” The four half-discs are differently 
colored and labeled, respectively—yellow: “I had the 
ability"/^I didn't have the ability” (reverse side); red: 
“the task was easy”/“the task was difficult" (reverse 
side); blue: “it was good luck”/“it was bad luck” (re 
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verse side); green: “I tried hard”/“I didn’t try hard 
enough” (reverse side). Subjects are asked to think 
carefully about why they succeeded or failed at each 
trial of the achievement tasks and then to indicate their 
reason(s) by using the instrument, The attribution 
measure allows a subject to indicate any one reason 
(attribution) or any combination of the four. This frees 
the subject from having to evaluate the contribution of 
any factor(s) that did not in fact play a part in the actual 
attributional response. Furthermore, the use of this 
spatial comparison procedure was undertaken on the 
assumption that the operations required to display 
which factor(s) was more influential are simpler than 
the operations required to rate independently the in- 
fluence of each factor. On each half-disc the identifying 
label (e.g., “I tried hard") was printed 10 times, like sun 
rays, at equally spaced intervals. By counting the 
number of labels exposed on each disc, a value from 0 
to 10 was assigned to each causal factor for any one trial. 
Scores from the attribution measure cannot be assumed 
to form a ratio scale but only to provide fairly broad 
indications of the degree of importance that individuals 
place on the separate causal factors in explaining their 
outcomes. For purposes of analysis and for meeting the 
assumption of independent measures required for cor- 
relational purposes, total scores were transformed to a 
6-point scale. 

The achievement task used to elicit attributional 
predispositions in the first phase, and employed as a 
transfer task in the experimental phase, was the circle 
design task. Designed by the first author, the task 
Shares properties with the more widely used block de- 
Sign tasks in that it requires the subject to analyze de- 
Signs into component parts and then synthesize those 
parts into a whole. The task consists of eight variously 
colored cardboard circular shapes, each with different 
cutout sections. By correctly sequencing the circular 
cards, subjects replicate two-dimensional stimulus de- 
signs presented on white cards. Success or failure on 
successive trials of the task was manipulated by the 
experimenter principally by extending or shortening the 
40-sec time limit by up to 8 sec and by manipulating the 
difficulty of the stimulus designs. The subjects level 
of ability displayed on three graded practice trials was 
Used to guide the selection of designs for the test trials. 

ofar as possible, subjects were presented with designs 
commensurate with their ability at the task so as to 
guard against biased task attributions, and the outcome 
was controlled by varying the time limit. s 

At each of the testing occasions in which attribution 
tasks were administered, subjects experienced three 
Success and three failure trials of each particular task; 
Success and failure were randomly determined with the 
Proviso that no more than two consistent trials could be 
Scheduled in a row and that the last trial was always à 
Success trial. n 

Several criteria governed the selection and design of 
the tasks used for attributional measurement. It was 
necessary to use tasks that were relatively novel (so that 
Subjects would be unlikely to approach them with re- 
Sponse histories biased by particular experiences) in 
addition to being susceptible to ambiguous outcome 
Perception; in particular, the use of equal frequencies 
Of success and failure also contributed to the perception. 
Of the task's being of intermediate difficulty (achieved 
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by manipulating outcomes so p = .5 and by pacing the 
task stimuli so that any particular item was not ob- 
viously easy or difficult for the individual subject), so 
that they elicited less biased attributional responses 
(Kelly, 1967). 

Persistence measures. The persistence task chosen 
for the study was a modified version of Feather's (1961, 
1963) Perceptual Reasoning Test. The test consisted 
of three items. Each was a line diagram, multiple copies 
of which were stacked in piles and placed in each corner 
of a large triangular envelope. The child's task was to 
trace over all the lines in the diagram according to two 
rules: The child was not permitted either to lift the pen 
from the figure during each attempt or to trace over any 
line twice. 

A first item card was taken from the appropriate pile, 
and subjects were told that they could try it as many 
times as they liked if they were unsuccessful but that 
they could proceed to the others whenever they chose 
to. The first item was insoluble, so that each trial the 
subject took ended in failure. The others were soluble, 
and the test ended when each subject solved these. 

Two measures of persistence were derived from the 
procedure: the total time spent from the time the child 
started the first item to the point at which he/she de- 
cided to try the next item in the set, and the number of 
trials taken at the first item before turning to the al- 
ternative task. The former measure is akin to the 
typical temporal measure used in studies of persistence, 
and the latter is analogous to resistance to extinction. 
Sets of parallel designs were developed for each testing 
occasion. À 

The study aimed at replicating and extending pre- 
vious findings that linked different attributional pre- 
dispositions to persistence. Previous support for the 
hypotheses had come from generalizing the attribu- 
tional framework to the analysis of experimental ex- 
tinction as well as to the general conception of temporal 
persistence (Weiner et al., 1971). However, the con- 
ceptual independence of the measures was not refl 
in their empirical independence in the present study. 
In view of the significant relation between the temporal 
persistence and resistance to extinction scores (r = .81, 
males; r = .61, females, both ps <.01), the measures are 
best interpreted as alternative indexes of persistence, 


Results 


The central hypotheses under investiga- 
tion in Phase 1 concerned the differential 
relation of persistence to attributions to the 
four Heiderian causal factors on a behavioral 
measure and two paper-and-pencil mea- 
sures. Table presents the obtained prod- 
uct-moment correlation coefficients between 

rsistence and attributions made on the 

ircle design task. : 
The ETENA reveal a markedly similar 
pattern for both sexes of a significant rela- 
tion between causal attributions for failure 
and persistence behavior. In accordance 
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Table 1 : 
Relation of Causal Attributions for Failure to 
Persistence Measures 


"Temporal Resistance to 
Attribution persistence extinction 
variable Male Females Males Females 
Ability —p* 024^ 0299 ngs 
Effort 62° — .ba** 4** -80** 
"Task —239*  —34** =% = 499% 
Luck =13 -.10 —13 =22 
Note. Malen = 87; female n = 71. 
* p <.05, 
** p « 01. 


with prediction, attribution of failure to in- 
sufficient effort was found to be positively 
related to persistence, whereas attribution 
of failure to the stable elements of ability and 
task difficulty were both negatively related 
to persistence. The hypothesized positive 
relation between luck-attributed failure and 
persistence was not supported. The results 
thus give direct support to the major per- 
sistence-attribution hypotheses. In addi- 
tion, the results provide a further rationale 
for the Phase 2 program of training subjects 
to make effort attributions for their failure 
experiences as a strategy to influence their 
persistence behavior. 

The IAR and EAS subscales showed only 
relatively limited and weak relations with 
persistence and with the attributional re- 
sponses made for success and failure expe- 
riences on the circle design task. Apart from 
the positive relation between persistence and 
the (E—) IAR scores for females (r 2.28, p 
< 01; r = .26, p < .05), no other IAR sub- 
scales showed significant associations with 
persistence. Scores on the (E-) subscale of 
the EAS were significantly correlated with 
persistence measures in the male samples (r 
= .21,p <.05;r = .24, p < .05) but not in the 
female sample. For girls, the IAR subscores 
showed significant relations with corre- 
sponding ability and effort attributions (r= 
31 and .37, respectively, p < .01) in failure 
situations on the circle design task; for boys, 
only the effort (--) scale showed a positive 
relation (r = .31, p < .01) with effort attri- 
butions on success trials. ‘The negligible 
coefficients for the other subscales indicate 
that with the present sample, the two 
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methods of assessing attributional predi 
positions are in fact not measuring the 
variable. Although the minimal degr 
relations may be comparable with 

would be theoretically expected for specii 
situations, the results would support th 
suggestion that the use of attribution 
patterns behaviorally assessed for sp 
situations may often be more useful than: 
general measure such as the IAR as the go 
means of attributional assessment. 


Phase 2 
Method 


The experimental phase of the investigation invo 
an attempt to influence achievement behavior by 
ducing effort-oriented attributional schemata 
subjects by reinforcement procedures. ‘The researel 
literature suggests the combined use of social 
forcement and of tokens backed up by tangible 
forcers as likely to be the most effective reinfoi 
stimuli in situations in which knowledge of what s 
are reinforcing for individuals is absent, as was tht 
in the present investigation (Broden, Hall, Dunlap, é 
Clark, 1970; O'Leary & Becker, 1967). A social reim 
forcement condition, the form of reinforcement 
readily available to teachers, was also included, 
procedure provides a possible approach to modii 
attributional predispositions that could be adapted fi 
within-class settings by teachers. 1 
Subjects and procedure outline. Subjects for thi 
experimental phase were 43 males who were that h al 
of the total male sample who least frequently attribut d 
failure to lack of effort on the circle design task in Phas 
1. One subject was randomly discarded, which left 4 
total of 42, who were then randomly allocated to th 
three treatment conditions: a control group, a 
reinforcement group (SR), and a token plus social re 
inforcement group (TR+SR), 
The trainer visited each classroom at the beginn 
of the phase and explained that he had selected a ral 
dom group of boys to see again, because limited 
prevented him from being able to see everyone a se 
time, Subjects were then called to the testing 


posttest session we tested control subjects first, then § 
subjects, then TR+SR subjects, and within each 
dition, subjects were alternately drawn from di 
classrooms insofar as it was possible. 1 
Control subjects received no training but received 
immediate posttests on the training task, a near 
task, a more remote, school-related transfer task, and 
the Perceptual Reasoning Test, as was the case in 
other conditions after training. Subjects were th n 
thanked for their cooperation, reminded that they. 
would be coming again the following week, and re 


PERSISTENCE AND PERCEPTION OF FAILURE 


to their class, armed with a copy of the combined IAR 
and EAS scales, which they completed in class. 

Two further achievement tasks for which attributions 
were assessed were developed for this phase. The block 
design task, designed for use as the training task, in- 
volves the ability to analyze geometric designs into 
component parts and then to synthesize those parts into 
a whole. The task materials consist of nine wooden 
cubes, each with a full red, yellow, blue, and white side, 
a diagonally split red and white side, and a diagonally 
split blue and yellow side. Subjects were given 30 sec 
on each trial to replicate two-dimensional stimulus 
block designs presented on white cards. The outcome 
was manipulated by the same strategy as that described 
for the circle design task. The circle design task was 
used in Phase 2 as the near transfer task. Similar in 
concept and execution to the training task, it requires 
the additional skills of sequencing manipulable parts 
and the perception of various proportions. The simi- 
larity of function provided a basis for expecting gener- 
alization of treatment effects. 

The other task designed for use in Phase 2 was an 
anagrams task, which functioned as a remote transfer 
task to gauge the wider generalizability of treatment. 
effects, Whereas the other tasks are perceptual-motor 
tasks, this task functions as a cognitive activity and thus 
presents subjects with more distinctive and absolute 
success and failure outcomes, which may be held to 
encourage attributions to the stable elements. Fur- 
thermore, as the task simulates one type of achievement 
activity that subjects are likely to experience in the 
classroom situation, it was regarded as a suitable test 
of the generalizability of treatment effects. The task 
consisted of words of up to six letters, in which the letter 
sequence is rearranged to varying degrees, Outcome 
was manipulated by the difficulty of the anagrams, 
Nn were carefully pretested with a matched sam- 
ple, 

Training procedure. The training procedure was 
aimed at inducing effort-oriented schemata in subjects. 
Subjects in the two conditions that had different 
methods of reinforcement engaged in a series of trials 
at an achievement task (block design task) on which the 
outcomes and frequency of success and failure were 
contrived by the trainer. Subjects were contingently 
reinforced for making effort attributions. 

Subjects were seated in the testing room opposite the 
experimenter and were told that they were going to do 


‘Training instructions were then given for the block 
design task, and subjects were instructed in the use of 
the attribution box, located between the trainer and the 
subject, on which causal attributions were monitored. 
Four appropriately colored and labeled corre- 
sponding to the four causal factors used on the attri- 
tion measure, under the heading of Success 
on the left side of the box and four on the right side 
under the heading of Failure. ‘The success T: 
sides of the box were separated by a central m. 
white lights, which were activated by a hand wit 
For the purpose of isolating target behavior (Le alogle- 
pissed subjects were i on the 
‘tor attributional responses to outcomes 
training-task trials, which were registered when subjects 


pressed a button on any one panel and an appropriately 
colored light in that panel was activated. 

In the social reinforcement condition, subjects were 
trained to criterion or given up to six blocks of 10 trials 
on the training task. To reach criterion, subjects had 
to make effort attributions for four out of five trials, for 
both success and failure in each block, Each subject 
received a minimum number of 40 trials, unless criterion 
was reached in the first block when training was ter- 
minated after 16 consecutively reinforced trials. Oth- 
erwise they continued until criterion was reached after 
Block 4,5, or 6, Training was terminated after 60 trials. 
whether criterion was reached or not. 

Each block consisted of five success and five failure 
trials, randomly scheduled with the restriction that no 
more than two success or two failure tríals could appear 
in a row. After each trial, subjects Loser e 
outcome by pressing the appropriate button on the at- 
tribution box and stating their attribution orally. MI 
effort attributions were contingently reinforced verbally 
with the following used —" That's 
good!""; “Very good (John)!"; "OK!^; “Good!” Verbal 

in a casual but pleasant 


rectly at the subject. 

For subjects who did not spontaneously make offort 
attributions, limited cuing was permitted. A maximum 
of three cues, one in each of the first three blocks, were 
allowed. nly after 


trying pretty hard that 
C forced trial in Blocks 2 and 3, 
time.” For the first reinf eer E 


sporia) dixe For a failure trial, 
given in place of the usual variant, Fora (Jela), w 


panel of white lit up on the at- 
that if the ERIR indicated that they hed won a token 


each 
During trainint, i the center panel of lights by 


outlined previously. 
same e. de child was instructed to count his tokens, 
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which were then exchanged for the tangible(s) of his 
choice that the tokens could purchase. 

Delayed posttest. In the delayed posttest (DPT) 
session administered 7-9 days later, subjects were 
seated in the testing room opposite the experimenter 
and were informed that they were going to do all the 
tasks again. Subjects were then administered parallel 
forms of each task. At the end they were thanked for 
their cooperation and returned to their classes. 
Subjects from the control and SR conditions at the end 
of this session were given a token gift, designed to 
counterbalance possible reactive affects of one group 
of subjects (TR+SR) being perceived as receiving spe- 
cial treatment. 

Follow-up posttest. In the event of significant ef- 
fects being maintained at the delayed posttest, provision 
was made for a follow-up to determine the long-term 
durability of treatment effects and to explore the extent 
of generalization to tasks administered by a tester other 
than the experimenter who conducted the training. 
This was conducted 4 months after the initial delayed 
posttest. Subjects were randomly assigned to the 
trainer and the independent tester, and all subjects were 
administered further parallel forms of all tests. 


Results 


Attribution measures. The general hy- 
potheses of the second phase of the investi- 
gation predicted that trained subjects would 
demonstrate substantial (and perhaps en- 
during) change in their attributional be- 
havior regarding effort ascription compared 
with control subjects on the training task and 
two associated transfer tasks. Further, it 
was hypothesized that the predicted modi- 
fication of effort ascriptions to failure dem- 
onstrated by experimental subjects would be 
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reflected in increased persistence on f] 
Perceptual Reasoning Test. 

Table 2 presents the mean raw scores al 
standard deviations for the three attr 
tion-achievement tasks at IPT and DPT, 

Block design task. As training req 
a novel task, no pretest measure was giv 
the block design task; attribution scores 
the pretest on the circle design task 
used as the covariate in an examination 
training effects to nonreinforced posttr 
ment trials ina 3 X 2 (Treatments X Occ 
sions) analysis of covariance factorial di 
sign. 

For failure trials, the analysis of varianc 
of adjusted effort-attribution scores for IP 
and DPT revealed a significant main tre 
ment effect, F(2, 38) — 19.25, p « .01. 
Newman-Keuls analysis of treatment mi 
averaged over occasions indicated that at 
immediate and delayed posttests, both § 
and TR+SR subjects exhibited a grea 
incidence of effort attribution for failum 
than did control subjects (p < .01). T 
difference between the adjusted means! 
SR and TR+SR subjects was not sig 
cant. 3 

Paralleling the failure-trial results, a ma 
treatment effect for success trials was 
found, F(2, 38) = 3.91, p <.05. Analysis 


treatment means revealed both SR 
TR+SR subjects attributed their success 
the training tasks to effort at a significant ly 
greater incidence than did control subject: 


Success Failure 
Control SR TR - SR Control SR TR +SR 
"Task M SD M SD M SD M SD M SD M 

Block design 

Er 10.71 6.98 22.79 9.21 20.53 10.18 3.07 441 19.35 12.07 19.85 

i PT 4 921 806 2221 9.08 16.85 10.66 3.21 5.42 22.57 9.20 15.71 
Circle design 

Ps 1142 942 11:35 8.33 1264 1193 307 426 407 490 3.57 

D T 14.35 8.54 22.86 11.88 2450 9.22 492 6.52 17.50 1122 16.57 

PT 1264 7.03 22.79 6.97 21.00 10.74 1.71 3.66 1892 1112 1871 

Anagrams 
P 13.00 9.91 17.14 10.69 18.86 10.03 442 538 15.71 12.22 15.43 
11.14 39.67 2107 10.03 20.79 9.81 4.36 4.79 16.00 1310 15.78 


test. 


Note. SR = social reinforcement; TR + SR = token plus social reinforcement; IPT = 


= immediate posttest; DPT = delayed P. 
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(p <.05). Again, no difference was evident 
between the two training methods. 

Circle design task. Results for the circle 
design task, the near transfer task, were an- 
alyzed in a 3 X 3 (Treatments X Occasions) 
analysis of variance factorial design with 
repeated measures on the second factor. 
Significant main treatment and occasions 
effects and interaction were revealed in the 
analysis of variance of effort-attribution 
scores for failure trials, F(2, 39) — 13.39, p < 
01; F(2, 78) = 36.38, p < .01; F(4, 78) = 9.59, 
p € 01, respectively. Tests of simple main 
effects for treatments conducted within each 
occasion revealed significant differences 
among treatment groups at both IPT and 
DPT, F(2, 87) = 11.1, p < .01, and F (2, 87) 
= 14.16, p < .01, respectively. The New- 
man-Keuls analysis of treatment means at 
both IPT and DPT indicated that the 
training groups displayed a significantly 
greater incidence of effort, attributions for 
failure than did control subjects (p < -01). 
As in the prior analyses, no difference was 
found between the two experimental con- 
ditions. 

The analysis of variance for success trials 
on the circle design task also revealed sig- 
nificant main treatment and occasions ef- 
fects and interaction, F(2, 39) = 4.05, p < 05; 
F(2, 78) = 13.87, p < 01; F(4, 78) = 5.15, p 
< .01, respectively. Simple main effects 
tests for treatments within occasions re- 
vealed significant treatment differences at 
both IPT. F(2, 80) = 5.70, p < 01, and DPT, 
F(2, 80) = 3.51, p <.05. Treatment means 
within each occasion were further analyzed 


Table 3 


Raw Score Means and Standard Deviations for Pers 
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by the Newman-Keuls procedure. At IPT, 
both SR and TR+SR subjects displayed 
significantly higher effort attributions for 
success trials than did control subjects (p < 
01). There was no difference between the 
two experimental groups at IPT. At DPT, 
only subjects trained in the SR condition 
displayed significantly higher effort attri- 
butions than the control subjects (p < .01). 
Furthermore, the SR subjects' incidence of 
effort attributions at DP'T was also greater 
than that of TR+SR subjects (p € .05), 
which did not differ from that of the control 
group. 

Anagrams task. The anagrams task 
provided a remote transfer task to gauge the 
generalizability of training effects. The 
specific hypotheses of transfer effects were 
tested ina 3 X 2 (Treatments X Occasions) 
factorial design for analysis of covariance 
with repeated measures on the second factor. 
Pretreatment scores on the circle design task 
were used as the covariate. 

The analysis of variance of adjusted ef- 
fort-attribution scores for failure trials re- 
vealed a highly significant treatment main 
effect, F(2, 28) = 5.58, p < 01. Treatment 
means averaged over both occasions, ana- 
lyzed by the Newman-Keuls procedure, re- 
vealed that at both IPT and DPT, both SR 
and TR+SR subjects exhibited a greater 
incidence of effort attributions for failure 
trials than did control subjects (p € 01), 
with no evident differential effectiveness in 
the two training procedures. 

A significant main treatment effect, F(2, 
38) = 4.34, p € .05, was found also in the 


istence Criteria on the Perceptual 


Reasoning Test. pe 
R+ 
caa eas. SD 


Control SR — ——4n 
vro SIM SD 
Measure M SD M 
"Temporal persistence 30.92 73.19 31.96 
Protest doy pod yr 9869 117.79 — 6541 
T s S ; 92.01 45.95 
DPT AURI ION: 04:9. 72:19 
Resistance to extinction 9 46 1.51 49 
Pretest 157 ie M 1.55 271 1.32 
IPT 1.64 2.99 147 2.51 1.39 


mM plus soci al reinforcement; IPT = immediate posttest; DPT = delayed post- 


Note. SR = social reinforcement; 
teat. 


| 
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analysis of variance of adjusted effort-at- 
tribution scores for success trials on the an- 
agramstask. Analysis of adjusted treatment 
means revealed that, as predicted, at both 
IPT and DPT occasions, both experimental 
groups rated success at the task as caused by 
effort expenditure to a significantly greater 
degree than did control subjects (p « .01). 
"There was no significant difference found 
between the two treatments. 

No significant changes were found on the 
six IAR subscales or the E+ subscale of the 
EAS on readministration at IPT. The SR 
subjects, however, were found to have sig- 
nificantly higher mean scores than did con- 
trols on the E— subscale of the EAS, F(2, 39) 
= 6.64, p < .05, Scheffé method, which in- 
dicated that in addition to changing their 
attributional reactions to failure in the ex- 
perimental situation, they displayed a ten- 
dency to attribute failure to insufficient ef- 
fort in relation to a more general range of 
achievement situations. 

Persistence measures. The mean raw 
scores and standard deviations for persis- 
tence indexes on the Perceptual Reasoning 
"Test are presented in Table 3. 

Analysis of variance of persistence scores 
yielded, in addition to a significant main 
effect for occasions (p € .01), a Treatments 
X Occasions interaction which was signifi- 
cant for one measure (resistance to extinc- 
tion), F(4, 78) = 2.94, p < .05, and ap- 
proached significance for the other (tempo- 
ral persistence), F(4, 78) = 1.96. Tests for 
simple main effects for occasions within 
treatments, carried out to examine the spe- 
cific increases predicted (Winer, 1971, p. 
384), revealed significant differences among 
occasions for both persistence measures for 
the SR condition, F(2, 78) = 6.88 and 10.95, 
both ps <.01, and for the TR+SR condition, 
F(2, 78) = 4.68, p < .05, and 10.11, p < .01. 
Further comparisons of the occasions means. 

within each condition, carried out by the 
Newman-Keuls procedure, showed that for 
both persistence scores, subjects in the SR 
condition. displayed significantly higher 
levels at both posttreatment occasions, 
compared with their pretest levels (p € .01). 
In addition, the TR+SR subjects exceeded 
their pretest persistence scores on both in- 
dexes at IPT (p < .01) and maintained this 
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significant difference at DPT for the resis- 
tance to extinction measure. 

Follow-up posttest. In an attempt to test 
the long-term durability of treatment effects 
and to explore whether these effects would 
generalize to testers other than the experi- | 
menter who administered the training, we 
conducted a follow-up posttest 16-17 weeks 
after the initial delayed posttest. Control 
and experimental subjects were randomly 
allocated between the trainer and an inde- 
pendent tester unknown to the children. In 
determining the durability of treatment ef- 
fects, as no difference had been found be- 
tween the two training treatments, results 
for these two groups were pooled in the main 
analyses. Raw score means and standard 
deviations for tasks administered at the 
follow-up posttest are presented in Table 
4 


Analysis of variance revealed that exper- 
imental subjects still made significantly 
more effort attributions than did controls for 
both success and failure trials on the training 
task (p < .05), for failure trials on the circle 
design task (p < .01), and for success trials 
on the remote transfer anagrams task (p < 
.05) at follow-up. Analysis of variance of 
persistence scores revealed no significant 
main effects or interaction. 

The question of generalizability of treat- 
ment effects to persons other than the 
trainer was examined in a 2 X 3 (‘Testers X 
Treatments) analysis of variance factorial 
design. Parallel results were found in the 
analysis of success trial scores on the training 
task and anagrams tasks. For subjects re- 
tested by the trainer, TR+SR subjects still 
made significantly more effort attributions 
than did either control or SR subjects. For 
the independent tester, the SR subjects still 
made significantly more effort attributions 
than did control or TR+SR subjects. 

For failure trials on the circle design task, 
both experimental groups still attributed 
failure to insufficient effort significantly 
more than did controls (p < .01), and this 
difference showed no differential strength 
between the two testers. 


Discussion 


The attributional analysis of persistence 


e 
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Table 4 
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Raw Score Means and Standard Deviations of Effort Attributions for Success and Failure Trials 
on the Three Achievement Tasks and Temporal Persistence and Resistance to Extinction on the 


Perceptual Reasoning Test at Follow-up 


own . . uH — IM 
0I Meme 
Block design 
Success 8.50 8.98 15,35 11,67 17.85 9.55 
Failure 8.00 10.19 9.14 13.25 13192 9M 
Circle design 
Success 1157 11.18 15.36 9.89 17.35 799 
Failure 443 6.38 14,00 10,04 14.86 11.10 
Anagrams 
Success 5.21 5.57 1321 1136 13.57 949 
Failure 4.64 6.34 11.78 10.30 942 1.29 
Temporal Persistence 71.21 43.55 67.93 57.79 81.92 
Resistance to Extinction 1.86 1.09 104 1.00 2.00 115 
Note: SR = social reinforcement; TR + SR = token plus social reinforcement. 
behavior received strong support both from subjects at the follow-up posttest conducted 
the Phase 1 findings, which highlighted the 4 months later, though there was some at- 
importance of effort attributions for persis- tenuation on the transfer tasks. Generali- 
tence at an achievement task, and from the zation of training effects was gout evi- 
major experimental finding of n 2,that dent across the posttest ——— ing a 
the treat; -i dency for exper- more remote, more school transfe 
reatment-induced tendency pel m ) at both IPT and DPT, 


imental subjects to attribute failure to effort. 
was paralleled by significant increases in 
persistence by those subjects. In addition, 
the findings provided a strong empirical 
foundation for the rationale of attribution- 
change programs that have focused on the 
development of effort-oriented schemata as 
their main objective. 

The findings, therefore, give strong sup- 
port to the major tenet of the attribution 
model of achievement motivation, that 
causal ascriptions influence and per! 
even determine subsequent achievement 
behaviors. The presant ezpara 
gave explicit support for the ki 
assumed by Dweck (1975) in the initial test 
of her model attribution retraining 
Evidence in support of the pri hy- 
potheses regarding the effectiveness 
experimental training procedures in E 
fying children's attributional pi 
was demonstrated on all tasks at IPT, and 
the durability of the training effects vn 
strongly evident at the delayed posttest ad- 
ministered 7-9 days after training. Fur- 
thermore, the maintenance of deem ef- 
ects on the modification 
patterns was still evident for experimental 


fi to be most prod , y 
apres ge ra Un -pencil 
measures, The utility of such a measure in 
future research will 


suitable a measure of attributional 
predispositions as has been apparently as- 
many previous studies. As a 


164 


measure designed to apply to achievement 
situations generally, the IAR may have lim- 
ited application in predicting attributional 
predispositions in a specific task or situation. 
Investigations that base their findings solely 
on attributional responses measured by 
means of the IAR perhaps should therefore 
be interpreted with caution, and use of be- 
havioral measures should be preferred in 
future attribution studies. 

In addition, only very limited support for 

attributional change was reflected in the 
paper-and-pencil measures. No change at 
all was registered on any of the IAR subscale 
measures, and on the (E—) subscale of the 
EAS only SR subjects were shown to in- 
crease significantly their choice of effort at- 
tributions from their pretest levels. Taken 
together with the Phase 1 findings, for male 
children at least, it appears that the EAS 
may be a more sensitive measure of change 
in effort attributions and that the develop- 
ment of an attribution scale along the lines 
of the EAS approach may be a profitable aid 
to future research in the field. 

In terms of prior experimental findings 
(Weiner et al., 1971; Weiner et al., 1972), the 
most discrepant result of Phase 1 was that 
the hypothesized positive relation between 
persistence behaviors and the attribution of 
failure to bad luck was not supported. 
Several studies have reported a reluctance 
of subjects to invoke luck explanations (e.g., 
Deaux & Emswiller, 1974), and such a ten- 
dency, also noted in the present investiga- 
tion, may have contributed to the discrepant 
findings. Furthermore, as Crandall et al. 
(1965) noted, as yet there is no information 
available to determine whether children 
have any generality in their belief in the 
power of various kinds of abstract, imper- 
sonal, external forces such as luck or fate. 

In Phase 2, several possible explanations 
may be suggested to account for the subsid- 
lary finding of the absence of differential 
effectiveness of the two training procedures. 
With the TR--SR treatment, it is possible 
that tokens provided a more primitive level 
of reinforcement than was actually necessary 
to initiate the new behavior, so that the 
condition was characterized by *reinforce- 
ment overkill” (Forness & MacMillan, 1972). 
Such a finding is of course not. incompatible 
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with prior evidence of developmental trends 
in children's responsiveness to different 
classes of reinforcers. Similarly, the possi- 
bility that any additional reinforcing effec- 
tiveness achieved by pairing tokens with 
social reinforcement may have been canceled 
or concealed because of distracting effects in 
the reinforcement procedures may be an- 
other contributing factor. In the TR+SR 
condition, subjects eagerly awaited the panel 
of lights on the attribution box to activate, 
after pressing the button that registered 
their attribution after each training trial. 
An alternative possible contingency of which 
subjects may have become "aware" in that 
situation was the contingency between token 
reward and the pressing of two specific but- 
tons rather than the actual contingency be- 
tween the reward and the specific attribution 
they conveyed by means of the button 
pressing. 

The highly successful application of sys- 
tematic social reinforcement in modifying 
cognitive attributions and consequent 
achievement behavior, however, provides 
one of the important contributions of the 
study, especially in view of its implications 
for the adaptation of the training procedure 
to classroom groups. Furthermore, the ef- 
fectiveness of the procedure is highlighted 
when it is recalled that training was com- 
pleted over a concentrated series of trials in 
a short period lasting approximately 1 hour. 
The procedure thus becomes feasible as an 
individualized instruction device for use 
within a remedial context. 

It is essential, however, that future studies 
investigate the generalization of experi- 
mental training to classroom achievement 
behavior as well as investigate applied 
classroom programs. Although transfer was 
obtained on a school-related task in the 
present study (anagrams task), the training 
procedure was aimed at only short-term 
behavior change and focused only on the 
narrow aspect of children’s achievement 
behavior in the experimental context; thus 
this transfer result is only a tentative indi- 
cation of the possibility that treatment ef- 
fects may have transferred to the subjects 
general achievement behavior. Some en- 
couragement is found, however, in the gen- 
eralization of some treatment results to an 
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independent tester in the follow-up study. 
The differential results found for the ex- 
perimental groups at follow-up in relation to 
the two testers on success trials for the 
training and anagrams tasks must be re- 
garded only as suggestive in such an explor- 
atory study. 

A further area in need of investigation in 
any replication concerns the order of post- 
tests. While the persistence findings of 
Phase 2 are of particular importance, it is 
possible, as Dweck (Note 1) pointed out, that 
they may have been inflated by a possible 
rehersal-training effect, as persistence data 
were obtained following attributional as- 
-sessment on the three achievement tasks in 
which overt attributions were made though 
no reinforcement was given. A further study 
would need to include an extra treatment 
whereby persistence behavior is assessed 
immediately after training and prior to the 
attribution posttests, to assess the possible 
influence of this effect. 

The feasibility of the wider application of 
the model of attribution retraining tested in 
the current investigation is dependent to a 
large degree on some as yet unanswered 
questions concerning the durability and 
generality of attributional schemata. The 
exploration of the generality of the findings 
to females as well as males must, of course, 
assume some priority. Furthermore, as well 
as determining the generality of attribu- 
tional patterns across a variety of tasks ani 
situations, it would also be of great value to 
monitor the change in attributional sche- 
mata over time. The possibility of long- 
term attributional change, and the general- 
ization of experimental training an 
trainer-specific effects back to the within- 
class situation and teacher, will be depen- 
dent to some degree on whether attributional 
patterns are stable from year to year and in 
successive school clases or whether new at- 
tributional schemata are developed from 
Class experiences. 

A difficult but necessary further devel- 
opment in future programs would be the 
shift from the use of arbitrary external Ee 
inforcement systems in the acquisition stage 

the generating of systems of self-rein- 
forcement that would operate to support 
attributional change in the maintenance 
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stage. Theoretically, it is during this 
maintenance process that appropriate at- 
tributional behavior will be accelerated by 
the natural consequences intrinsic to task 
completion, social approval, feelings of 
self-worth, and satisfaction of assuming 
self-responsibility, which internalized, ef- 
fort-ọriented schemata should generate. 
Such a process needs to be empirically 
monitored, however, to provide guidelines 
for appropriate training schedules and 
techniques. 

Together with Dweck’s (1975) demon- 
stration of the success of attributional re- 
training, the findings of the present study are 
extremely relevant to all aspects of remedial 
teaching. The results suggest that appro- 
priate, achievement-enhancing attributions 
may be relatively easily established by sys- 
tematic reinforcement and may facilitate 
ongoing achievement activities. The 
training procedures are direct and well 
within the grasp of ordinary teachers. By 
this means, enduring motivational strategies 
may potentially be inculcated and general- 
ized to a wider range of achievement activi- 
ties. Such a process may have the effect of 
enabling the individual to persevere and to 
achieve success and competency and, con- 
sequently, of increasing the independence of 


the learner. 


Reference Note 


1. Dweck, C. Personal communication to R. L. Debus, 
October 1975. 
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the Reading and Math Achievement of dio mec s 
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Previous studies relating field dependence to the superior academic achieve- 
ment of Anglo American children relative to Mexican American children have 
relied exclusively on single-method approaches of measuring this cognitive 
style. The present study attempts to make a more comprehensive test of the 
relationship between field dependence and achievement by comparing mem- 
bers of both cultural groups on three commonly used measures of field depen- 
dence in order to determine the consistency of cross-cultural differences, in- 
tercorrelations, and predictive validity of these measures for Anglo American 
and Mexican American school children. Results generally failed to support 
the assumptions that (a) Mexican American children are more field dependent 
than Anglo American children, (b) intercorrelations between the three field- 
dependence tests should be significant. and comparable for members of both 


cultural groups, and (c) field dependence is of substantial importance to the 
school achievement of Anglo American and Mexican American children. The 
educational implications of the findings are discussed. 


Psychologists have been concerned with 
the role of cognitive styles in affecting the 
reading and math achievement of elemen- 
tary school children (Denny, 1974; Eakin & 
Douglas, 1971; J. Kagan, 1965; Robinson & 
Gray, 1974; Santostefano & Paley, 1964). 
One cognitive style dimension receiving 
considerable attention in this regard is 
Witkin’s field dependence-independence 
construct (Witkin et al., 1954). A field- 
dependent cognitive style is defined by a 
global mode of perception so that the orga- 
nization of the field as a whole dominates 
Perception of its parts. In contrast, a field- 
independent cognitive style is characte i 
by a more analytic approach to the stimulus 
field, which is reflected in greater ease in 
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overcoming an embedding context (Witkin 
et al., 1962). 

Interest in field dependence as a factor in 
children's academic success stems from the 
assumption that similar analytic require- 
ments are inherent in both tests of this par- 
ticular cognitive style and also reading and 
math (see e.g., Cohen, 1969). Thus, children 
who are in the early stages of reading acqui- 
sition, and particularly those learning to read 
through phonemic instruction, must visually 
identify discrete vowels and syllables within 
the framework of a larger, more complex 
word. Similar disembedding procedures are 
also apparent in carrying out math compu- 
tations, since such operations usually require 
the child to conceptually break up numbers 
into their various components to arrive at 
new relations. 

Studies have found Anglo American chil- 
dren to be more field independent than 
Mexican American children (Buriel, 1975; S. 
Kagan & Zahn, 1975; Ramirez & Price-Wil- 
liams, 1974; Sanders, Scholz, & S. Kagan, 
1976; Canavan, Note 1) and, on the average, 
to obtain higher scores on tests of reading 
and math achievement (Coleman et al., 1966; 
Grebler, Moore, & Guzman, 1970). These 
parallel differences in cognitive style and 
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school achievement suggest that the lower 
academic performance of Mexican American 
children may be related to their more field- 
dependent cognitive style. 

Early studies by Watson (1969) and 
Canavan (Note 1), using combined samples 
of Anglo American, Mexican American, and 
Afro American grade school children, found 
a significant positive correlation between 
field independence and reading and math 
achievement. More recently, S. Kagan and 
Zahn (1975) found that in contrast to Anglo 
American children, Mexican American 
children were more field dependent and 
scored lower on tests of reading and math 
achievement. Their analysis also showed a 
relation of field dependence to both reading 
and math achievement for both groups. 

The validity of Witkin’s field depen- 
dence-independence construct rests on the 
correlation of responses across a number of 
different assessment methods (Witkin et al., 
1962). A review of the literature reveals, 
however, that with only two exceptions 

(Eakin & Douglas, 1971; Gluck, 1973), 
studies relating field dependence to chil- 
dren’s academic abilities have relied on a 
single-method approach of determining 
children’s cognitive style. Results of the two 
studies using a multimethod approach re- 
vealed only modest interrelation between 
alternate measures of field dependence 
(Gluck, 1973, using the Children’s Embed- 
ded Figures Test [CEFT] and the Rod- 
and-Frame Test [RFT]) and between these 
measures and level of academic achievement 
(Eakin & Douglas, 1971, using CEFT and the 
Wechsler Intelligence Scale for Children 
Block-Design subtest [WISC-BD]; Gluck, 
1973, using CEFT and RFT). The use of 
multiple measures is more common with 
adult subjects, but here also the literature 
reveals a number of discrepancies between 
various tests of field dependence (for a re- 
view, see Arbuthnot, 1972). The lack of 
relation between measures may be due in 
part to the fact that Witkin’s findings re- 
garding consistency among field-dependence 
tests were based on samples of male children 
(Witkin et al., 1962). Since there are no 
studies with Mexican American children 
using a multimethod approach, the role of 
field dependence in explaining Mexican 
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American-Anglo American differences in 


' school achievement may be overstated. 


The present study examines the interre- 
lationship between three measures of field 
dependence—the Portable Rod-and-Frame 
Test (PRFT), the Children's Embedded 
Figures Test, and the Wechsler Intelligence 
Scale for Children Block-Design subtest— 
and compares the relationship of each of 
these tests to the reading and math 
achievement of Anglo American and Mexi- 
can American children to determine the re- 
liability, intercorrelations, and predictive 
validity of these measures for both groups of 
children. Witkin accepts the WISC-BD as | 
an effective measure of field dependence 
(Witkin, 1967; Witkin et al., 1974) and cites 
it along with the CEFT and PRFT as being 
among the most commonly used measures of 
this psychological construct (Witkin & 
Berry, 1975). 


Method 
Subjects 


A total of 80 children participated as subjects in the 
study: 40 Mexican American and 40 Anglo American. 
Ten children from each cultural group were drawn from 
each of the following grades: first, second, third, and 
fourth. An equal number of male and female children 
were chosen from each grade level for both cultural 
groups. All subjects were students in the same semi- 
rural elementary school in southern California. The 
school is predominately (75%) Mexican American, as is 
the surrounding community, and there is no busing of 
children either to or from other areas. Mexican 
American subjects were randomly selected from a pool 
of second- and third-generation students. Seventeen 
second- and 28 third-generation children participa! 
in the study. First-generation children (born in Mex- 
ico) and those whose dominant language was Spanish 
(as determined by school records) were excluded from 
the study, since their reading and math scores more 
accurately reflect their limited understanding of English 
rather than their actual competencies in these academic 
areas. Anglo American children were randomly s€- 
lected from the same classrooms as the Mexican 
American children. > 

All Mexican American children participating in the 
study were bilingual to some extent, although Spanish 
was not their primary or dominant language. School 
records and previous research in this same geographical 
area (Buriel, 1975; Landman, 1954) indicate a stable 
residential pattern for second- and third-generation 
families; over half of the Mexican American subjects 
had one or both parents reared in the community sam- 
pled. Anglo American children and their parents were 
all born in the United States and, for the most part, were 
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more transient, with few of them having lived in the 
community for more than one generation. All children 
in the study (both Mexican American and Anglo 
American) were from low-income families and partici- 
pated in federally funded poverty programs at their 
school. Fathers of all children were employed in a va- 
riety of semiskilled and skilled jobs. 

For purposes of analysis, Grades 1 and 2 and Grades 
3and 4 were combined to form two separate grade lev- 
els. This was done because many single classrooms at 
the school contained multiple grade levels within the 
same classroom, thus raising the possibility of inter- 
grade diffusion of instructional effects within class- 
rooms, Grade levels combined in this manner more 
closely approximated the actual distribution of students 
in classrooms, since 88% of the first and second graders 
and 90% of the third and fourth graders participating 
in the study were assigned to multiple grade level 
classrooms. "There were no significant age differences 
between Anglo American and Mexican American 
subjects at either the lower grade level (for first and 
second combined, t = .43) or upper grade level (for third 
and fourth combined, t = .18). Ages of lower grade level 
children ranged from 6.4 years to 8.1 years, with a mean 
age of 6.8 years. Upper grade level children ranged in 
age from 8.3 years to 10.5 years, with a mean age of 8.9 
years. 


Procedure 


Subjects were individually administered the PRFT, 
the CEFT, and the WISC-BD. All children were tested 
by the same Mexican American male experimenter. 
Testing was done in two separate sessions 2 or 3 days 
apart. "Two tests were administered in the first session 
and one in the second. Half of the subjects in each 
grade level were given the PRFT, the CEFT, and the 
WISC-BD in that order, while the other half received 
these tests in the reverse order. "There were no differ- 
oe between subjects due to order of test adminis- 
tation, 


Tests 


Portable Rod-and-Frame Test. This test was de- 
Veloped as a portable version of Witkin et al.'s (1954) 
original darkroom Rod-and-Frame Test and has been 
validated against the original (Oltman, 1968). The 
PRET was modified slightly (after Gerard, Note 2) by 
Placing the black silhouette of a human figure over the 
standard rod, which the subject attempted to adjust to 
the vertical. "his was done to convey more concretely 
to young subjects the concept of verticality, that is, 

making the man inside the box stand up straight. 
Subject’s score was the average of the absolute devia- 
tions from the true vertical for the eight trials given. 

igher scores were in the direction of greater field de- 
Fendence; lower scores were in the direction of greater 
ield independence. d 

Children's Embedded Figures Test. This test is a 
modification of the original Embedded Figures Test, 

leveloped especially for use with young children (Wit- 

in, Oltman, Raskin, & Karp, 1971). The test proce- 
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dure involves the administration of two series. Subjects 
are asked to locate the shape of a triangle (Series 1) and 
a house (Series 2), which is embedded in a complex 
geometric design. The complex designs are presented 
one at a time on 14 X 21.5 cm cards. Responses to each 
card are scored as 1 or 0. A score of 1 is given for correct. 
responses on the first attempt to locate the hidden 
shape. A maximum of 25 is possible on this test, with 
higher scores indicating greater field independence. 

Wechsler Intelligence Scale for Children Block- 
Design Subtest. This test is taken from the Perfor- 
mance scale of the WISC (Wechsler, 1949). Subjects 
are given a set of multicolored blocks and then pre- 
sented with colorful geometric designs of increasing 
complexity, Their task is to copy the reference designs 
by the appropriate arrangement of blocks within the 
allotted time period. Satisfactory performance on the 
WISC-BD test requires children to conceptually “break 
up” the reference design into discrete parts corre- 
sponding to the color and shape of the individual blocks, 
so that they can be rearranged to produce the appro- 
priate reference stimulus. Each design has a minimum 
score of 4, with bonus points being given for more rapid 
completion of the correct design. A maximum of 55 
points are possible, with higher scores indicating greater 
field independence. 

Metropolitan Achievement Tests (MA T). Measures 
of reading and math achievement were taken from the 
Reading and Total Math subtests of the MAT, which 
was administered to all children near the end of the 
school year as part of the school's regularly scheduled 
assessment testing. Children were tested with different. 
batteries of the MAT depending on their grade level. 
Raw scores for both reading and math were converted. 
to standard scores according to the procedure described. 
in the test handbook (Durost et al., 1971). Standard 
scores derived by this method express results for a 
subtest area (e.g., reading) for all batteries of the MAT 
on a single common scale. Achievement scores and 
their derived statistics are expressed in terms of these 


standard scores. 
Results 


were collected and analyzed in three 
een determine (a) the reliability of the 
three field-dependence tests in measuring 
cultural differences in field dependence, (b) 
the intercorrelation among the three tests for 
members of both cultural groups, and (c) the 
relationship of each of the three tests to the 
reading and math achievement of Mexican 
American and Anglo American children. 
Analysis of variance using multiple re- 
gression procedures was employed as the 
primary method of analysis in order to test 
the significance of the contribution made to 
reading and math achievement by the in- 
teraction of categorical variables with the 
continuous field-dependence variables. 
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Table 1 à : 
Means and Standard Deviations for School Achievement and Three Field-Dependence 
Measures 
e E —— M eee 
Portable 
Rod-and- Children's WISC 
Frame Embedded Block-Design 
Reading Math Test* Figures Test subtest 
Subject M SD M SD M SD M SD M SD 
Anglo American 
Grades land 2 32.85* 10.69 39.15 14.87 13.85 427 1250 4.22 € 5.91 
Grades 3 and 4 48.35 10.72 57.65 12.28 10.15 6.34 16.60 5.02 23.35 12.24 
Mexican American 
Grades 1 and 2 29.20 9.20 35,70 9.54 14.29 4.65 1215 4.95 8.05 6.10 
Grades 3 and 4 46.20 9.44 5270 1166 1407 5.62 1505 5.95 13.30 9.31 


Note. N = 20 for each grade cell. 


? Lower means are in the direction of greater field independence. 


b WISC = Wechsler Intelligence Scale for children. 
* Means for reading and math are standardized scores. 


"Table 1 presents the means and standard 
deviations for the two school achievement 
and three field-dependence measures. 


Cultural Differences in the Three Field- 
Dependence Measures 


A separate Culture X Sex X Grade Level 
analysis of variance was performed on the 
scores of each of the three field-dependence 
tests to assess how uniformly the three in- 
struments measured cultural differences in 
field dependence. 

Analyses revealed no cultural differences 
on either the PRFT or the CEFT. There 
was a significant culture main effect for the 
WISC-BD, F(1, 72) = 8.49, p < 01, with 
Anglo American children scoring higher than 
Mexican American children. Sex asa factor 
was significant only for the PRFT, F(1, 72) 
= 4.55, p < .05, with boys more field inde- 
pendent than girls. Grade was not signifi- 
cant for the PRFT, but highly significant for 
both the CEFT, F(1, 72) = 9.89, p « .001, 
and WISC-BD, F(1, 72) = 22.79, p < .001, 
analyses. In addition, there was a signifi- 
cant Culture X Grade level interaction for 
the WISC-BD, F(1, 72) = 4.56, p « .05. 
"There were no other significant interactions. 
Comparisons (Tukey) of the WISC-BD 
means for the Culture X Grade Level inter- 
action indicated that Anglo American chil- 
dren scored significantly higher on this test 
than Mexican American children at the 
upper grade level (p « .05) only. 


Intercorrelations Between the Three 
Field-Dependence Tests 


Correlations between the three field- 
dependence tests were computed separately 
for Anglo American and Mexican American 
children to determine if the magnitude and . 
pattern of intercorrelations was the same for 
both cultural groups. For Anglo American 
children, all tests were significantly inter- 
correlated: CEFT-PRFT (r = —.38, p < .01), 
CEFT-WISC-BD (r = .47, p « .01), and 
PRFT-WISC-BD (r = —.41, p < .01). For 
Mexican American children, CEFT corre- 
lated significantly with both PRFT and 
WISC-BD, but the correlation between the | 
two latter tests did not reach significance: | 
CEFT-PRFT (r = —.44, p < .01), CEFT- 
WISC-BD (r = .42, p < .01), and PRFT- 
WISC-BD (r = —.23, ns). 


Relationship of the Three Field- 
Dependence Tests to Reading and Math 
Achievement 


Three separate Culture X Sex X Grade 
Level X Field Dependence analyses were 
performed on the data for achievement: one 
for each of the three field-dependence mea- 
sures as a predictor of reading achievement 
and one for each of these same three mea- 
sures as a predictor of math achievement. 

Results of the three reading analyses were 
consistent in showing no significant main 
effects of culture, sex, or field dependence 
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and no significant interactions. Only grade 
* level was significantly related to reading 
performance (p < .001) in all three analyses 
[Analysis 1, PRFT: F(1, 64) = 62.61; Anal- 
ysis 2, CEFT: F(1, 64) = 44.03; Analysis 3,, 
WISC-BD: F(1, 64) = 36.81]. 

Results for math achievement revealed no 
significant culture main effects. "here was 
` asignificant main effect for sex (p < .05) for 

all analyses, with girls greater than boys 
[Analysis 1, PRFT: F(1, 64) = 4.03; Analysis 
2, CEFT: F(1, 64) = 5.01; Analysis 3, 
WISC-BD: F(1, 64) = 4.52]. Grade Level 
was again highly significant (p « .001) in all 
three math analyses [Analysis 1, PRFT: F(1, 
64) = 38.67; Analysis 2, CEFT: F(1,64) = 
$4.99; Analysis 3, WISC-BD: F(1, 64) = 
22.69]. The PRFT as a measure of field 
dependence was not significantly related to 
math achievement. There were, however, 
significant main effects for both the CEFT 
- [Analysis 2, F(1, 64) = 14.54, p <.01] and the 
_ WISC-BD [Analysis 3, F(1, 64) = 9.22, p < 
. 9I]. A significant Culture X CEFT inter- 
action [Analysis 2, F(1, 64) = 4.05, p € .05] 
was the only significant interaction for the 
math achievement analyses. Inspection of 
the standardized regression weights of both 
groups indicated the CEFT made a larger 
contribution to the variance in math 
. achievement for Mexican American children 
9 = .69, p < .05) than Anglo American 
children (8 = —.07, ns). 


Discussion 


_ Overall, results of the present investiga- 
tion reveal discrepancies in the relation of 
the three field-dependence tests between 
Anglo American and Mexican American 
children and between these tests and school 
achievement for these same groups of chil- 
dren. Perhaps the most significant finding 
: of the first area of investigation is the failure 

of the PRFT, CEFT, and WISC-BD to uni- 
ormly measure cultural differences in field 
dependence. Absence of a significant cul- 
tural difference on the PRFT is inconsistent 
With results of previous studies that have 
found Anglo American children to score 
More field independently than Mexican 

American children on various rod-and-frame 
instruments (Buriel, 1975; S. Kagan & Zahn, 
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1975; S. Kagan, Zahn, & Gealy, 1977; Ram- 
irez & Price-Williams, 1974; Canavan, Note 
1) A nonsignificant culture main effect for 
the CEFT is consistent, however, with re- 
sults of the only two other studies (S. Kagan 
et al., 1977; Knudson & Kagan, in press) 
comparing these two groups of children on 
this measure of field dependence. Perfor- 
mance on the WISC-BD revealed mixed re- 
sults; a significant Culture X Grade Level 
interaction indicated that the cultural dif- 
ference on this measure was significant only 
among upper grade level children. 

The use of multiple methods of assessing 
field dependence represents an important 
distinction between the present investigation 
and earlier studies comparing the cognitive 
styles of Anglo American and Mexican 
American children: Previous studies have 
not demonstrated consistent cultural dif- 
ferences on more than one measure of field 
dependence. Therefore, results of the 
present investigation do not generally sup- 
port the assertion that Anglo American 
children are more field dependent, or more 
analytical in their field approach, than 
Mexican American children. With the ex- 
ception of the present study, investigations 
using rod-and-frame measures have consis- 
tently observed cultural differences. In 
contrast, the three studies (including the 
present investigation) comparing Anglo 
American and Mexican American children 
on the CEFT have consistently observed no 
cultural difference. The almost consistent 
presence or absence of cultural differences 
between studies as an apparent function of 
instrumentation suggests that it may be 
more appropriate to evaluate Anglo Ameri- 
can-Mexican American differences in cog- 
nitive style more narrowly in terms of the 
specific instrument yep such iem 

ear, rather t as more genera 
elai of cultural differences in field 
dependence in the broader sense. | 

Results of the second area of investigation 
revealed differences in the intercorrelations 
of the field-dependence tests between Anglo 
American and Mexican American children. 
While all intercorrelations were significant 
for Anglo American children, the PRFT and 
WISC-BD failed to show a significant level 
of correspondence for the Mexican American 
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sample. The WISC-BD thus appears to 
measure aptitudes of Mexican American 
children that, unlike Anglo American chil- 
dren, are unrelated to performance on the 
PRFT. The more uniform and significant 
correlations between the tests for the Anglo 
American sample suggests the possibility 
that field dependence may be a more mul- 
tifactorial construct for these children, 
which, unlike Mexican American children, 
includes a component made up of aptitudes 
that the WISC-BD shares in common with 
both the PRFT and the CEFT. The 
meaning of field dependence, therefore, may 
be different for Anglo American and Mexi- 
can American children. 
The final question of this study deals with 
the relation of the three field-dependence 
tests to measures of reading and math 
achievement. None of the cognitive style 
tests were significantly related to reading for 
members of either cultural group. Although 
unexpected from the point of view that 
performance on both field-dependence tests 
and reading involve similar disembedding 
abilities, the present findings for reading are 
consistent with theoretical speculation that 
field dependence is unrelated to various 
types of verbal abilities (Witkin et al., 1962). 
For math, a significant Culture X CEFT in- 
teraction indicated that the relation of the 
CEFT was not reliably equivalent for both 
cultural groups. The CEFT accounted for 
a significant and greater proportion of the 
variance in math for Mexican American than 
Anglo American children. The absence of 
a significant CEFT-math relationship for 
Anglo American children suggests the pos- 
sibility that performance on the CEFT may 
be affected by aptitudes that are differen- 
tially related to math achievement for Anglo 
American and Mexican American children. 
Such a possibility again raises suspicion that 
the field-dependence construct may not be 
equivalent for both groups of children. 
Recently, however, S. Kagan et al. (1977) 
observed no relation of CEFT performance 
to math for a sample of lower socioeconomic 
Status Anglo American and Mexican Amer- 
ican children. Hence, the finding of a sig- 
nificant Culture X CEFT interaction for 
math in the present study should be re- 
garded as tentative. 
Only the WISC-BD showed a significant 
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relation to math for both groups of children, 
Thus, except for the fact that the WISC-BD | 
was unrelated to reading, its relation to math 
represented the only clear instance of the 
expected relation of field dependence to 
achievement for both cultural groups. This 
finding indicates that of the three field- 
dependence tests used in this study, the 
WISC-BD may have the most cross-cultural 
validity as an index of children's mathe- 
matical abilities. 

Finally, although there were no sex dif- 
ferences in reading, girls of both cultural 
groups scored significantly higher in math 
than boys, a finding consistent with results 
of large-scale studies using predominately 
lower socioeconomic status children (Mac- 
coby & Jacklin, 1974). Also, Anglo Ameri- 
can and Mexican American children did not 
differ significantly in either reading or math 
achievement. This is an important finding, 
since previous studies have tended to em- 
phasize only the inferior academic perfor- 
mance of Mexican American children—an 
unfortunate situation that may lead to an 
unquestioning acceptance of the expectation 
that children of this cultural group are in- 
variably academic failures (Johnson, Gerard, 
& Miller, 1975; Rosenthal & Jacobson, 
1968). | 

Results of the present investigation are 
inconsistent with previous studies by S. 
Kagan and Zahn (1975) and Canavan (Note 
1), who reported a significant relation of field 
dependence to both the reading and math 
achievement of Anglo American and Mexi- 
can American children. A number of ex- 
planations could account for the discrepancy 
between studies. The first of these concerns 
differences in instrumentation. Previous | 
investigations by S. Kagan and Zahn and | 
Canavan both used the same rod-and-frame 
apparatus, the Gerard (Note 2) Man-in-a* 
Frame Test, as their measure of field de: 
pendence. The task requirements of this - 
instrument appear very similar to those of 
the PRFT used in the present study. 
However, unlike the PRFT, the Man-in-2- 

Frame Test has not been empirically vali- 
dated against Witkin's original darkroom 
Rod-and-Frame Test, and, thus, may have 
only face validity as a substitute for the 
original Rod-and-Frame Test. It may be 
that some unique characteristics of the 


Man-in-a-Frame Test, perhaps unrelated to 
; field dependence, make it a more sensitive 
indicator of children’s reading and math 
achievement. A further difference related 
to instrumentation centers on the specific 
achievement tests used to measure reading 
- and math performance. The present study 
used the Metropolitan Achievement Test, 
whereas earlier investigations have used the 
Stanford Achievement T'est (Canavan, Note 
1), the Cooperative Primary Tests, and the 
Comprehensive Test of Basic Skills (S. 
gore & Zahn, 1975). Although the content 
and format of these four tests appear quite 
similar, the unique characteristics of the 
MAT may cause it to be less affected by 
those aptitudes that contribute to perfor- 
mance on tests of field dependence. 
Another possible explanation of the 
present results considers the extent of cul- 
tural differences in both field dependence 
and school achievement. Unlike previous 
- investigations, the present study revealed a 
general absence of cultural differences on 
both. these variables. This suggests the 
possibility that the relation of field depen- 
dence to achievement may appear only in 
those populations where the two cultural 
groups differ significantly on both these 
Variables. Consistent with this interpreta- 
| lion, Kagan et al. (1977) observed that in 
contrast to kindergarten, fourth-, and 
sixth-grade children at the same lower s0- 
cioeconomic status school, a subsample of 
1 second-grade Anglo American and Mexican 
‘American children did not differ signifi- 
cantly in reading and math achievement nor 
in field dependence as measured by the 
CEFT. Additionally, no relation of field 
"dependence to achievement was observed for 
the sample of second-grade children in con- 
trast to other children in the study for whom 
Such a relation was found to exist. Hence, 
the possibility that field dependence and 
* achievement are unrelated in populations 
ated on both these variables seems ten- 
e. 
Finally, contextual factors related to 
lemographic characteristics of the sample 
and testing procedures may have contrib- 
Uted to the discrepancy between studies. 
Us, unlike previous studies, Mexican 
» American children made up a substantial 
majority of the students at the school where 
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the present study was conducted. Fur- 
thermore, testing for field dependence and 
achievement were carried out under differ- 
ent n conditions (individual versus group 
testing, respectively) and by different per- 
sons for initially different purposes. The 
possible influence of these contextual factors 
on performance on any given test could thus 
have affected the outcome of the study. 
The present results have important edu- 

cational implications. A review of the goals 
and standards of federal education programs 
for young children (White et al., 1973) re- 
veals that some of these programs employ 
field-dependence tests to predict the 
achievement-related abilities of students. 
The use of these measures for evaluating 
student academic potential has apparently 
proceeded on the assumption that (a) tests 
such as the PRFT, CEFT, and WISC-BD are 
all equally reliable indicators of field de- 
pendence or analytic ability; (b) performance 
on all these tests correlates highly with aca- 
demic achievement; and (c) these tests are 
equally valid for children from different 
cultural backgrounds. The present findings 
do not generally support these assumptions, 
but instead indicate that alternate measures 
of field independence sometimes yield dif- 
ferent results when used either to compare 
the cognitive styles of Anglo American and 
Mexican American children or to predict 
their reading and math achievement, Since 
a common practice is to use only a single in- 
strument to measure field dependence, ed- 
ucators and researchers should be aware that 
the results obtained in such cases may be 
more a function of the particular instrument 
being used, rather than actual differences in 
the trait they presume to be measuring in 
their population of subjects. Failure to re- 
alize the apparent limitations of these tests 
in educational settings, especially those 
containing children from different cultural 
backgrounds, may result in inaccurate as- 
sessment of children’s academic potential 
and may lead, in turn, to unrealistic teacher 


expectations. 
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Effects of Three Types of University Lectu 
re 
Notes on Student Wehieverent 


Vaughan Collingwood and David C. Hughes 
University of Canterbury, Christchurch, New Zealand 


During a series of three electronics lectures, university students made use of 
three different kinds of lecture notes. These were (a) duplicates of the lectur- 
er’s detailed notes; (b) copies of the headings, key points, diagram outlines, ta- 
bles, and references from the lecturer’s notes with spaces for the students to 
add additional notes as appropriate; and (c) the students’ own notes taken 
during the lectures. An analysis of variance indicated that there were signifi- 


cant differences between the notes as measured by a delayed achievement test, 
the order being, from high to low, a, b, and c. The students’ preferences for 
the three types of notes were obtained before and after the experimental lec- 
tures. A significant interaction between initial student preference and treat- 
ment was found. However, student preference changed in favor of (a) fol- 


lowing exposure to the three types of notes during the expe! 


The lecture method is widely used in 
university teaching and will undoubtedly 
continue to be used in the future. There- 
fore, any procedures that enhance the ef- 
fectiveness of lectures are likely to have sig- 
nificant effects on student learning. 

One important lecture variable is the type 
of student notes that is used. At one ex- 
treme students are required to make their 
own lecture notes and are provided with 
nothing by way of handouts, whereas at the 
other extreme they are provided with du- 
Plicates of the instructor's lecture notes. In 

tween are various kinds of summary 
handouts. Little research has been con- 
ducted to investigate the effectiveness of 
different types of lecture notes, and most 0 
the research that has been done is lacking in 
external validity because various conditions 
Used in the studies differ markedly from the 
Teal lecture situation. These atypical con- 
ditions include subjects who participate in 
the study only to meet a course requirement 
or to receive course credit; lecture material 
that is not related to the subjects’ course of 
study; the use of tape recorders to present 
the lecture material; the use of very short 
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rimental lectures. 


lecture sessions; the use of posttests at un- 
usual times such as immediately following 
the lecture; and the removal of the students’ 
lecture notes following the lecture to prevent 
revision (Aiken, Thomas, & Shennum, 1975; 
Annis & Davis, 1975; Carter & Van Matre, 
1975; Di Vesta & Gray, 1972; Hartley, 1976; 
Northcraft & Jernstedt, 1975). 

It is generally accepted that lecture notes 
can serve either or both of two functions. 
The first is an encoding function in which 
the lecture material is transformed into a 
more meaningful form for the learner and 
hence is easier to remember. The second is 
an external storage function in which the 
student is provided with material for later 
review (Aiken et al., 1975; Annis & Davis, 
1975; Carter & Van Matre, 1975; Di Vesta & 
Gray, 1972). The degree to which these 
functions operate depends on the conditions 
that exist in the particular situation. For 
example, if the students have no reason to 
review their lecture notes or if there is no 
opportunity for them to do so, the second 
function cannot operate. In most university 
contexts, lectures are followed by term ex- 
aminations some time after the completion 
of the lectures so that students have both 
reason and opportunity for review. This 
being the case, the external storage function 
of notes operates in addition to the encoding 
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function in the normal university situa- 
tion. 

The encoding function should be maxi- 
mized when students are asked to make their 
own notes. However, since student-made 
notes contain omissions, errors, and over- 
simplifications (Hartley & Marshall, 1974; 
Maddox & Hoole, 1975), the external storage 
function is reduced with student-made 
notes. The external storage function should 
be maximized when duplicates of the lec- 
turer’s notes are given out, although in this 
situation the encoding function is lessened. 
This article sets out to investigate which of 
these functions is the more important in a 
realistic situation and whether it is possible 
to provide handouts that effectively combine 
both functions. 

In the 2 decades since Cronbach (1957) 
introduced the term Aptitude X Treatment 
interaction, there have been many studies 
designed to identify interactions between 
instructional treatments and individual 
differences between students (Berliner & 
Cahen, 1973; Bracht, 1970; Tobias, 1976). 
At the beginning of the present study, the 
subjects were asked which of three types of 
lecture notes they would prefer to have and 
which they thought would be most efficient 
in terms of passing examinations. Almost 
all students responded in the same way to 
both questions. Approximately 2596 of the 
students indicated that they thought making 
their own notes was most effective, the same 
proportion said that having a copy of the 
lecturer’s notes was most effective, and the 
rest (50%) believed that an outline of the 
lecturer’s notes with spaces for additional 
student notes was most effective. 

This difference of opinion suggests that 
different students may use the encoding and 
external storage functions of lecture notes to 
different degrees. That is, students who 
believe that making their own notes is most 
effective may be those for whom the encod- 
ing function is of prime importance, whereas 
those who prefer a copy of the lecturer’s 
notes may rely heavily on the external stor- 
age function. If this is the case, then an in- 
teraction would be expected between the 
students’ preference for a particular kind of 
lecture notes and the kind of notes they ac- 
tually receive. Consequently, it was also 
decided to investigate this interaction. 
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Method 


Subjects 


The subjects were 57 University of Canterbury, 
Christchurch, New Zealand, undergraduate engineering 
students taking the second professional year course, 
Electrical Engineering 2. The subjects received the 
experimental treatments during three digital electronics 
lectures that were part of the normal digital electronics 
course. 


Materials 


A sequence of three lectures covering the topic bist- 
ables, registers, and counters was used for this inves- 
tigation. The three lectures formed a complete unit 
and were judged by the lecturer to be similar in diffi- 
culty level and material covered. The lecturer's de- 
tailed notes for the topic were obtained, and two 
handouts were prepared for each of the three lec- 
tures. 

The first handout was a complete typewritten copy 
of the lecturer’s notes including neat diagrams (here- 
after called full notes). No special spaces were left for 
the students to add their own notes to the full notes. 
The second handout was an edited typewritten copy 
with all detailed text removed, leaving only the headings, 
key points, unlabeled diagram outlines, tables, and 
references (hereafter called partial notes). Space equal 
to the amount used in the full notes was left beneath the 
headings and key points in the partial notes for the 
students to add their own notes as appropriate. This 
note form requires the students to encode the material, 
but it also provides a structure that should improve the 
quality of the notes as an external store. 


Measurement Instruments 


Prequestionnaire and postquestionnaire. Both the 
prequestionnaire and postquestionnaire asked the 
students which of three types of lecture notes they 
would prefer and which type of notes they considered 
would be most efficient in terms of passing exams. I 
was thought that students might select the type of lec- 
ture notes that demanded the least effort of them, a$ 
their answer to the first question, while actually be- 
lieving that another lecture note form would be better 
in terms of passing the exams, hence the inclusion of the 
second question. 

The final question in the prequestionnaire asked the 
students how often they discussed their notes with the 
other students. As 9396 claimed that they sometimes 
did, it was clear that detailed information regarding 
note-exchanging behavior during the experiment would 
berequired. Accordingly, the postquestionnaire asked 
the students whether they had read, exchanged, oF 
photocopied any of the other sets of handouts used in 
the experiment. 

The final part of the postquestionnaire was open- 
ended to allow the students to suggest improvements 
and present their ideas on the use of the experimenta 
handouts or any other forms of lecture material pre- 
sentation. 
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Table 1 
Experimental Design 


Group Lecture 1 Lecture 2 Lecture 3 


Partial notes Full notes 
Full notes Own notes 
Own notes Partial notes 


1 Own notes 
2 Partial notes 
3 Full notes 


The achievement test. A four-choice multiple- 
choice achievement test sampling the content of the 
experimental lectures was devised jointly with the lec- 
turer. The test was incorporated as part of the midterm 
examination, which also contained two essay questions. 
The students were aware that all topics covered up to 
the midterm examination would be testable and that 
the midterm mark would count 10% toward their final 
course grade. The test instructions stated that incor- 
rect answers would incur a penalty, and the standard 
guessing correction was applied. An item analysis was 
performed on the test questions using all the test papers 
submitted by the class (N = 70). On the basis of the 
item analysis results, one question was eliminated be- 
cause it failed to discriminate satisfactorily. The final 
posttest contained 11 questions—4 questions covering 
the content of Lecture 1, 3 questions covering the con- 
tent of Lecture 2, and 4 questions covering the content 
of Lecture 3. 


Design 


A repeated measures design was used. Subjects were 
randomly assigned to three treatment groups. Each 
group received the same combination of lecture note 
types but for different lectures as shown in Table 1. 
The three types were full notes, partial notes, and no 
handouts at all so that students were required to make 
their “own notes.” 


Procedure 


The prequestionnaire was administered during the 
last electronics lecture of the first term of 1976. It was 
explained to the subjects that the questionnaire was 
intended to ascertain their preferences for different 
methods of learning from lectures. No other reasons 
Were given, nor was any mention made of the fact that 
Ns experiment would take place during the following 

rm. x 

Dur: ing the afternoon laboratory session of the day 
prior to the first lecture in the sequence, the subjects 
were handed their appropriate notes for the experi- 
mental lecture sequence. The nature of the laboratory 
session made it unlikely that there would be time for the 
subjects to discuss the differences in the handouts. The 
Subjects were told to bring the notes to the lecture the 
next day. 

At the start of the first lecture, the subjects were told 
of the different types of lecture notes that they had re- 
ceived and that this was related to the questione?» 
they had completed during the first term. No partic- 

instructions on how to use the handouts were given. 
At all stages of the experiment, novelty aspects were 
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minimized and the whole procedure was made as close 
to the normal teaching situation as possible. Therefore, 
no attempt was made to prevent the students from 
discussing their notes despite the responses given to the 
third question of the prequestionnaire. Due toa slight 
variation in pace, the third lecture carried over into the 
first 10 minutes of the subsequent lecture. Lectures 1 
and 2 remained within their allotted period of 50 min- 


utes. 

Three days after the last lecture in the experimental 
sequence, the postquestionnaire was administered 
during a normal lecture period. Four weeks after the 
completion of the last lecture, the subjects sat the 
midterm examination, which included the multiple- 
choice test. 


Analysis 


Achievement. The experimental subjects were the 
57 students who had (a) completed the prequestion- 
naire, (b) attended all the lectures, (c) completed the 
postquestionnaire, and (d) completed the multiple- 
choice test. 

The test papers were marked to provide three scores 
for each subject. These scores measured the subject’s 
performance on the content contained in each lecture. 
These three scores were each converted to standard 
scores with a mean of 500 and a standard deviation of 
100. A two-way analysis of variance with unweighted 
means suitable for a design with repeated measures on 
one factor was carried out. The two factors were 


treatment and preference. 


Preference 


It was assumed that the three types of lecture notes 
used in the study represent three points on a continuum 
that ranges from own notes to full notes. Therefore, the 
changes in preference from the prequestionnaire to the 


Table 2 
Midi Achievement Scores and Standard 


Deviations Following Three Types of 
Lecture Notes 


"Treatment 


. Treatment . — 
Own Partial Full 
Preference ^ notes _notes_notes_Total_ 
tes (12) 
"y Y 497.0 533.7 505.6 512.1 
SD 76.2 668 119.1 91.6 
artial notes (31) 
E M = 75.3 5129 498.3 495.5 
SD 1033  11L8 940 104.4 
tes (14) 
Pri : 4584 475.8 560.2 498.1 
SD 94.0 1066 36.5 95.7 
i 
a 475.7 508.2 515.0 499.6 
SD 968 104.6 93.8 100.0 


Note. Numbers in parentheses are ns. 
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Table 3 

Analysis of Variance for Differences in 
Achievement Following Three Types of 
Lecture Notes 


Source of variation df MS F 
Between subjects 56 
Preference (A) 2 3,826 <1 
Subjects within groups 54 15,554 
Within subjects 114 
"Treatment (B) 2 24881 3.60* 
AXB 4 17178 248* 
B X Subjects Within 
Groups 108 6,919 
* p € .05. 


postquestionnaire were treated as having direction; and 
the sign test was used to test the changes for signifi- 
cance. 


Results 
Achievement 


Table 2 shows the mean achievement 
scores obtained in the experiment. The 
analysis of variance summary in Table 3 
shows no significant relationship between 
the subjects' preferences and their achieve- 
ment. However, the differences between the 
treatment means were significant at the .05 
level. Further, the interaction between 
preferences and treatments was also signif- 
icant at the .05 level. The nature of this in- 
teraction is shown in Figure 1. 


Preferences 


Table 4 shows the distribution of re- 
sponses to the first two questions in the 
questionnaires. Question 1 asked the stu- 
dents which of the three types of lecture 


Table 4 
Responses to Questions 1 and 2 on the 
Prequestionnaire and Postquestionnaire 


Notes preference 


Item Own Partial Full 
Prequestionnaire 
Question 1 12 31 14 
Question 2 15 28 14 
Postquestionnaire 
Question 1 8 19 30 
Question 2 7 22 28 
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notes they would prefer, and Question 2 
asked them which type of notes they con- 
sidered would be best for passing exams. 
Fifty of the 57 students responded in the 
same way to Questions 1 and 2 in the pre- 
questionnaire, whereas 49 did so in the 
postquestionnaire. The changes in the 
distributions of responses from the pre- 
questionnaire to the postquestionnaire were 
significant at the .001 level for both ques- 
tions. 


Discussion 


The results indicate that the kinds of 
notes used in this study are significantly 
different in their effects on achievement. 
Further, the results suggest that the external 
storage function of notes in the normal uni- 
versity situation is an important one, with 
the order of the treatment means being full 
notes, partial notes, and own notes. How- 
ever, there was also a significant disordinal 
interaction between preference and treat- 
ment. Those students whose preference was 
for full notes had the lowest mean on the own 
notes and partial notes treatments, but they 
had the highest mean when given the full 
notes treatment. Both the partial notes 
preference students and the own notes 
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preference students had higher scores when 
given the partial notes treatment, which 
obviates the conclusion that students rou- 
tinely do best when given their lecture note 
preference. Clearly all groups had higher 
achievement when given some form of lec- 


' ture notes handouts. 


The responses to the questionnaires show 
that almost all students preferred the lecture 
notes that they thought would be most effi- 
cient for passing examinations and that fol- 
lowing exposuré to the three kinds of notes, 
their opinions about which type of notes is 
most effective shifted in favor of the full 
notes, although a substantial proportion of 
the students still preferred the partial notes. 
These changes in preference are in accord 
with the achievement data. 

The responses to the open-ended ques- 


tions in the postquestionnaire were varied, 


but by far the most common responses were 
either that the students wanted to have full 
notes in addition to their own notes or that 
they wanted the full notes but with spaces 
left for them to add their own notes. Annis 
and Davis (1975) provided support for the 
idea that providing full notes in addition to 
the students’ own notes gives rise to high 
levels of achievement. | 
On the póstquestionnaire 29 students in- 
dicated that they had read some of the other 
handouts distributed for the sequence of 
lectures, and 17 students indicated that they 
had exchanged handouts with other stu- 
dents. Further, 12 students indicated that 
they had photocopied some of the other 
handouts. These data are indicative of 
considerable interplay between treatment 
groups that could not be'avoided without 
making the situation an artificial one. It 
seems reasonable to suggest that under more 
strictly controlled conditions, the effects of 
the experimental treatments would be more 
Significant than they were in the present 
Study, K 
In conclusion the results of this investi- 
gation suggest that the efficiency of lectures 
can be improved by the distribution of some 
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form of notes. The results suggest that 
given the note forms used in the present 
study, students should be offered a choice 
between partial or full notes whenever this 
is practical. However, it would be easy to 
adapt the partial notes and full notes used in 
the present study by, for example, adding 
content to the partial notes or leaving spaces 
in the full notes so that students can add 
their own notes. It might be that an adap- 
tation of this kind could cater to all students 
without the need for different notes for stu- 
dents with different preferences. However, 
whether this is the case or not can only be 
decided on the basis of future research. 
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The focus of this study was the relationship between the science career com- 
mitment and the seience teacher models of 141 female and 129 male high 


school students. On the basis of earlie 


dents with same-sex teacher models would indicate a higher science career 
commitment. Furthermore, it was predicted that perceived teacher attrac- 
tiveness and amount of science-related teacher contact would affect the influ- 
ence of same-sex teacher models more than that of opposite-sex teacher mod- 
els. The results supported the hypotheses. Implications of the results for fe- 
male participation in science are considered. 


The scarcity of female teachers in tradi- 
tionally masculine academic fields is a 
problem that has recently been addressed by 
several writers (e.g., Fox, 1974; Husbands, 
1972; O’Leary, 1974; Tidball, 1973). These 
writers are concerned that the lack of female 
role models discourages females from setting 
career goals in traditionally masculine fields. 
In her review of issues related to women’s 
occupational aspirations, O’Leary (1974) 
suggested that the lack of female models in 
nontraditional work roles is, in fact, a barrier 
to the selection of nontraditional roles by 
women. 

In support of this view, Fox (1974) re- 
ported that among students in five univer- 
sities, the proportion of male and female 
majors in each discipline was related to the 
proportion of male and female teachers in 
the discipline. Although this study was 
correlational in design, it does suggest that 
the narrow range of occupational choices of 
the majority of undergraduate women has 
been based, in part, on the lack of female 
models in traditionally masculine fields. 

Douvan (1976) has recently commented 
on the importance of role models for those 
women students who do aspire to nontradi- 
tional careers. She believes that women 
benefit greatly froni exposure to several fe- 
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r findings, it was predicted that stu- 


$ 
male teachers who present diverse models of 
female professionalism and who clearly enjoy 
their careers. That role models are partic- 
ularly important to women making nontra- 
ditional career choices is consistent with a 
study by Almquist and Angrist (1971) of 
career-salient and non-career-salient 
women. These researchers found that oc- 
cupational role models (e.g., teachers) had 
been important in the occupational choices 
of 68% of the career-salient group in contrast 
to only 23% of the non-career-salient group. \ 
In spite of this interest and concern re- 
garding the influence of same-sex role 
models on career choice, there has been little 
controlled research to test the relationship 
between the availability of models and 
long-term achievement aspirations and be: | 
havior. However, there have been a number 
of studies of the effect of same-sex and op- 
posite-sex models on various behaviors im- 
mediately following exposure to models. 
Many of these studies have included pre 
school subjects exclusively, and in these 
studies, a consistent tendency to imitate 
same-sex models has not been found (Mac: | 
coby & Jacklin, 1974). However, these 
negative findings are not surprising, since | 
gender constancy does not develop until 
sometime during the preschool year? 
(DeVries, 1969). Studies of older subjects | 
suggest that same-sex models may have 8 | 
more powerful effect for subjects beyond the | 
preschool years. For example, Maccoby a 
Á 
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| Wilson (1957) found that seventh graders 

| tended to retain information regarding 
same-sex models more than information 
regarding opposite-sex models, and Ward 
(1969) found that elementary school children 
were more likely to imitate a same-sex adult 

than an opposite-sex adult as they per- 
formed an experimental game. 

There is some evidence that same-sex 
models may be more effective than oppo- 
site-sex models in eliciting behavior that has 
traditionally been considered sex inappro- 

priate. In two experimental studies, Wolf 
(1973, 1975) exposed children to a same-sex 
"or opposite-sex peer who was playing with a 
sex-inappropriate toy. After viewing a live 
model (Wolf, 1973) and after viewing a tel- 
evised model (Wolf, 1975), children were 
more likely to imitate the same-sex peer. 
Same-sex models who perform sex-inap- 
propriate (i.e., nontraditional) occupational 
roles may have a similar effect on occupa- 
tional goal setting among children. Tibbetts 
(1975) reported that 9096 of the elementary 
| school girls in her sample thought that when 
they grew up they might like the job of bus 
driver, a nontraditional role for women. In 
exploring this surprising finding, Tibbetts 
found that these children rode school buses 
driven by women. She concluded that when 
children can view others in nonstereotyped 
Toles, children are then able to consider these 
Toles for themselves. 
The evidence presented suggests that 
i subjects beyond the preschool years may be 
more likely to imitate a same-sex model than 
an opposite-sex model; furthermore, Tib- 
betts's incidental finding suggests that for 
females, this willingness to imitate same-sex 
Models can extend to a willingness to con- 
Sider traditionally masculine occupation: 
Toles. One purpose of this study was to test 
the effect of occupational role models on 
Commitment to traditionally masculine ca- 
.Teers. The general occupational field se- 
lected as a focus for this study was the field 
of science because science has always been 
à male-dominated field, but one that has the 
Potential to offer a wide variety of rewarding 
Careers to women. On the basis of the evi- 
dence presented regarding the influence of 
Same-sex and opposite-sex models, it was 
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hypothesized that for both male and female 
high school students, commitment to a ca- 
reer in science would be greater among those 
students who have a same-sex teacher as 
their primary science model. 
_ In spite of the evidence for the greater 
influence of same-sex models in subjects 
with gender constancy, some investigators 
have found no difference in these subjects’ 
imitation of same-sex and opposite-sex 
models (e.g., Bandura & Kupers, 1964; 
Masters, 1972). In most of these studies, 
exposure to models has been brief, and often 
little or no interaction has occurred between 
the models and the subjects who view the 
models. This failure to take into account 
the nature of the relationship between the 
model and the viewer may be one reason for 
the inconsistent results in this area. Clearly, 
there is a need to consider those aspects of 
the model-viewer relationship that may af- 
fect the influence of same-sex and oppo- 
site-sex models. The second purpose of this 
study was to explore the effect of model- 
viewer relationship variables on the influ- 
ence of same-sex and opposite-sex models. 
Two relationship variables should be of 
crucial importance in determining the effect 
ofa model. These variables are (a) the ex- 
tent to which the model is perceived by the 
viewer to be an attractive person whom the 
viewer would like to imitate and (b) the ex- 
tent to which the viewer has the opportunity 
to be in contact with the model in the context 
of the behavior to be imitated. It was ex- 
pected that the viewer would be more likely 
to copy a model whom he or she sees as ât- 
tractive and with whom he or she has had 
greater contact. Furthermore, given that 
the same-sex models have a greater influence 
in general, then these relationship variables 
should be particularly important in deter- 
mining the influence of same-sex models. 
Hence, it was predicted that these relation- 
ship variables would be more important in 
determining the effect of same-sex models. 
More specifically, it was hypothesized that 
for students with a same-sex teacher model, 
commitment to a career in science would be 
higher (a) among those who view their 
teacher model as an attractive individual 
whom they would like to imitate and (b) 
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among those who have had greater science- 
related contact with their teacher model. 
For students with an opposite-sex teacher 
model, these effects were predicted to be less 
important. 


Method 


Subjects 


The subject pool was comprised of 143 male and 145 
female senior high school students from 14 Missouri 
high schools. All subjects were presently enrolled in 
science courses that were taught by teachers who had 
participated with their students in science research 
competitions. 


Measures 


Sex of model. Because most of the subjects had had 
more than one high school science teacher, it could not 
be assumed that their present teachers were their pri- 
mary science models. Therefore, the following question 
was devised to determine the sex of the students’ pri- 
mary science model: “When you think about science, 
which of the science teachers you have had comes to 
mind most often?” 

Perceived attractiveness of model. To determine 
the attractiveness of the science model as perceived by 
the student, the students were asked to indicate on a 
7-point scale the extent to which they would like to 
become like their science teacher model. 

Science-related contact, Although all of the stu- 
dents had science-related teacher contact in regard to 
regular classroom activities, the students differed in 
their amount of involvement in original research 
projects. When these original research projects are 
done at the high school level, they require the close 
guidance and supervision of the science teacher; 
therefore, the extent to which students carried through 
with their original research projects served as a measure 
of the extent of the student’s individual science-related 
teacher contact. Students who indicated that they had 
thought about doing an original scientific research 
project were asked to indicate on a checklist all of the 
activities that they had performed as a part of that 
project. On the checklist were activities that required 
little or no individual teacher contact (e.g., “Thought 
about it on my own” and “Talked about it with teacher 
a little”), items that required a moderate amount of 
individual teacher contact (e.g., “Collected, built, or 
purchased materials or wrote questionnaires, etc., 
needed for the project”), and items that required a large 
amount of individual teacher contact (e.g., “Submitted 
a project report to some competition or scientific orga- 
nization such as Science Fair, Junior Academy of 
Science, Junior Science, etc.”). 

Commitment to a science career. To determine the 
students’ commitment to a science career, students were 
asked, “To what extent would you like to have a career 
in science?” Students indicated the strength of their 
commitment on a 7-point scale. 
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Procedure 


During a symposium for high school science students, 
10 male and 10 female teachers who had worked with 
students on original science research projects were asked 
to participate in the study. Although more male 
teachers were present at the symposium, an equal 
number of male and female teachers were asked to 
participate in order to obtain more equal numbers of 
male and female subjects with male and female teacher 
models. The study was described to the teachers as an 
investigation of factors affecting high school students’ 
interest in science careers. At that time the teachers 
were introduced to the Science Career Attitude Survey 
(SCAS). The SCAS, which was created by the present 
authors, included the measures described above in ad- 
dition to several other measures that are not pertinent 
to the hypotheses under consideration here. 

The teachers were asked to administer the SCAS to 
one of their senior high school science classes in which 
some students had successfully completed an original 
research project. The purpose of this stipulation was 
to insure that some students in each class would have 
experienced a large amount of individual science-related 
teacher contact. In order that students would not feel 
inhibited in responding honestly to the questions, 
teachers were asked to allow their students privacy in 
completing the SCAS. Teachers were promised feed- 
back regarding the general results of the study; however, 
it was understood that to insure confidentiality, no re- 
sponses of individual students were to be shared with 
teachers. Because most of the schools were not local, 
the teachers were instructed to return the question- 
naires to us by mail. Fourteen teachers returned the 
questionnaires; usable questionnaires were received 
from 129 males and 141 females. 


Design 


Subjects were divided into low, medium, and high 
perceived-teacher-attractiveness groups on the basis 
of the attractiveness-of-model measure. The effect of 
perceived teacher attractiveness on commitment to å 
science career for students with same-sex and oppo- 
site-sex teacher models was tested in a 3 X 2 X 2 (At 
tractiveness X Sex of Student X Sex of Teacher Model) 
factorial design. 

Subjects who had been involved in individual re- 
search projects were divided into low, medium, and higħ 
teacher contact groups on the basis of the teacher con- 
tact measure. The effect of teacher contact on com- 
mitment to a science career for students with same-seX 
and opposite-sex teacher models was tested in a 3 X? 
X 2 (Contact Level X Sex of Student X Sex of Teacher 
Model) factorial design. 


Results 


One focus of this study was the relation- 
ship between perceived teacher attractive- 
ness and science career commitment. The 
mean science career commitment scores, 
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Table 1 
Mean Science Career Commitment Scores for 


Perceived-Teacher-Attractiveness Groups 


Perceived 
teacher 
attractiveness 
group 
Low 
M 4.08 3.65 
SD 1.52 1.69 


9 11 11 11 
5.91 5.51 


k 
n 
| Moderate 
M X " 
SD 4 i 
n 34 23 30 32 
High 


Female student. 
Male Female 


Male student 
Male Female 
teacher teacher teacher teacher 


1.40 1.37 
M 5.34 5.61 
SD 1.37 1.91 
n 41 11 32 25 


standard deviations, and number of subjects 
for low, medium, and high teacher attrac- 
tiveness groups are presented in Table 1. 
The effect of teacher attractiveness, sex of 
student, and sex of teacher on science career 
commitment was tested by a three-way 
least-squares analysis of variance. The re- 
sults of this analysis appear in Table 2. 
The main effect of perceived attractive- 
Ness was significant (p < .02). Subjects in 
the moderate- and high-attraction groups 
had higher science career commitment 
Scores than did students in the low-attrac- 
tion groups. However, since the attrac- 
| tiveness measure was based on the concept 

. Of “being like" the teacher model, this main 
effect may simply mean that students who 
aspire to science careers would like to have 
Work roles similar to those of their science 
teachers. 

D The interaction between sex of teacher 
model and sex of student was significant (P. 
X.02). As hypothesized, girls with female 
teacher models (M = 5.25) and boys with 

male teacher models (M = 5.44) had higher 
Science career commitment scores than gir. 
With male teacher models (M = 4.83) or boys 
With female teacher models (M = 4,85). No 
Other second-order interaction effects were 
Significant. 

The three-way interaction of teacher at- 
activeness, sex of student, and sex of 
teacher was also significant (p <.01). As the 

» interaction effect of the highest order, this 
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effect explains the main effect of perceived 
teacher attractiveness and the second-order 
interaction of sex of teacher and sex of 
subject. The interaction is consistent with 
the hypothesis set forth regarding the effect 
of the attractiveness variable. Those males 
and females with the highest science career 
commitment scores were those who had 
same-sex teacher models whom they per- 
ceived as moderately or highly attractive; 

those with the lowest scores were those who 

had same-sex models whom they perceived 

as low in attractiveness. Within each sex, 

subjects with opposite-sex teacher models 

had lower science career commitment scores 

than subjects with attractive same-sex 

models and higher scores than subjects with 

unattractive same-sex models. Hence, the 

variable of perceived attraction was impor- 

tant in determining the science career com- 

mitment of those students who had a same- 

sex teacher model but was not important in 

determining the science career commitment, 

of those students who had an opposite-sex 

teacher model. 

‘A second focus of this study was the rela- 
tionship between science-related teacher 
contact and science career commitment. 
Those subjects who did not use the checklist 
measure of teacher contact were not in- 
cluded in the analysis of the teacher contact 
variable because amount of teacher contact 
was ambiguous for these subjects. Hence, 
the analysis of teacher contact scores m- 
cluded a subset of 106 of the 129 total males 
and 95 of the 141 total females. The means 


Table 2 N 
Summary Table of Least-Squares Analysis of 


Science Career Commitment Scores for 


Teacher Attractiveness Groups 


Source SS df F 
Level of perceived teacher 1 
attractiveness (A) 2447 2 An 
Sex of subject (B) 279 1 1.01 
Sex of teacher (C) 09 1 «l 
AXB 246 2 <i 
AxC 178 2 «1 
BxC 1613 1 5,85" 
AxBxC $1538 2 572" 
Error (within cells) 190.49 258 
* p <.02. 
** p € 01. 
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Table 3 d 
Mean Science Career Commitment Scores for 
Science-Related Teacher Contact Groups 


Science-related _Male student ^ Female student 
teachercontact Male Female Male Female 
group teacher teacher teacher teacher 
Low 
M 5.17 4.65 5.29 4.92 
SD 1.45 2.02 1.56 1.51 
n 18 13 16 17 
Moderate 
M 5.41 5.91 5.07 5.58 
SD 1.45 1.40 1.41 1.91 
n 33 11 15 10 
High 
M 6.18 4.83 491 6.13 
SD 95 1.81 1.73 1.27 


n 21 10 17 20 


and numbers of students for low, medium, 
and high teacher contact groups appear in 
Table 3. 

The effect of teacher contact, sex of 
teacher, and sex of student on science career 
commitment was tested in a three-way 
least-squares analysis of variance. The re- 
sults of this analysis are presented in Table 
4. None of the main effects were significant. 
The significant interaction between sex of 
student and sex of teacher found for the total 
group was again found for this subset of 
students (p < .05), even though the mean 
science career commitment scores were 
higher for this subset overall. Males with 
male teacher models (M = 5.57) and females 
with female teacher models (M = 5.57) ex- 
pressed greater career commitment than 
males with female teacher models (M = 5.11) 
or females with male teacher models (M = 
5.09). No other second-order interactions 
were significant. 

The interaction of teacher contact, sex of 
teacher, and sex of student was significant (p 
<.05). This interaction is consistent with 
the hypothesis regarding the teacher contact 
variable. The highest levels of science career 
commitment were found among those males 
and females who had had large amounts of 
same-sex teacher contact. Science career 
commitment scores were lower for males and 
females who had had large amounts of op- 
posite-sex teacher contact than the scores 
were for any groups who had had same-sex 
teacher contact. That is, a large amount of 
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same-sex teacher contact was related to 
higher science career commitment scores, 
but a large amount of opposite-sex teacher 
contact was not. 

f 


Discussion Àj 

'The results of this study indicate that 
students who have had same-sex teacher 
models have a greater science career com- 
mitment than students who have had op- | 
posite-sex teacher models. This finding 
held for both females and males. However, — 
the influence of same-sex teacher models was 
related to the teacher's attractiveness as 
viewed by the student. Only those students - 
who viewed their same-sex teacher as mod- 
erately or highly attractive reported higher 
science career commitment. In fact, th 
mean commitment score of those studen 
who saw their same-sex teacher as low in 
attractiveness was lower than the mean 
commitment score of students with oppo- 
site-sex teacher models. This interaction 
suggests that same-sex teacher models can 
have a more powerful effect both in encour 
aging and in discouraging science career 
commitment. 

For students who worked on individual 
research projects, the effects of teacher 
contact on science career commitment was 
also considered. These effects were simi 
to those found for the attractiveness vari 
able: Same-sex teachers were associated 
with greater career commitment only amo! 
those students who had had large amounts 
of individual teacher contact. Hence, th 


Table 4 


Summary Table of Least-Squares Analysis of 
Science Career Commitment Scores for 4 


Teacher Contact Groups EM 


Source SS df 
Level of teacher contact (A) 12.07 2 248 
Sex of subject (B) 46 1 18 
Sex of teacher (C) 01 1 a 1 
AXB 8 "2 a i 
AXC 812 2 1$ 
BxC 1045 1 12 
AXBXC 1576 2 3. 
Error (within cells) 507.26 189 3 


*p<.05. 
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highest mean science career commitment 
was found for the high-contact groups who 
had had same-sex teacher models. 

These findings are consistent with earlier 
modeling research that indicates that models 
perceived as more similar are more likely to 
be imitated (Maccoby & Jacklin, 1974). 
However, these results suggest that simi- 
larity (in this case, having a same-sex model) 
does not necessarily lead to greater imita- 
tion: Relationship variables affect the 
viewer’s choice to adopt or not to adopt the 
same-sex model’s behavior. These findings 
may be useful in understanding why some 
studies of older subjects (who have clearly 
attained sex constancy) have not found a 
difference in the effects of same-sex and 
opposite-sex models. In these studies, the 
model-viewer relationship variables have 
seldom been considered; yet, if the same-sex 
model is viewed as unattractive or if little 
personal contact takes place between the 
same-sex model and the viewer, the viewer 
may not adopt the model's behavior. 

The Science Career Attitude Survey in- 
cluded a question that asked the student to. 
indicate the teacher who had helped him or 
her most with a research project. This 
question was meant to be an indication of the 
sex of the teacher with whom the student 
had had individual science-related contact. 
Unfortunately, for some reason, many stu- 
dents who reported working on a project did 
not respond to this question. Since the 
question appeared after the sex-of-model 
measure and since it was presented in à 
Similar format, perhaps the students re- 
garded it as redundant. Because this in- 
formation was missing for many subjects, the 
Sex-of-model measure was used in its place. 
Hence, in testing for the effect of teacher 
contact on science career commitment, the 
assumption was made that the sex-of-model 
Measure was an accurate indicator of the sex 
of the teachers with whom the students had 
had individual contact. It is certainly pos- 
sible that in some cases, the sex of teachers 
with whom students worked was different 
from the sex indicated by the sex-of-mode 
measure. However, this issue may not 
important to the arguments presented here. 
In light of the significant interaction effect 
of teacher contact, sex of student, and sex of 
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teacher on science career commitment, it 
appears that the sex of the teacher whom the 
subject most clearly associated with science 
(as indicated by the sex-of-model measure) 
is significantly related to science career 
commitment and to the association between 
teacher contact and science career commit- 
ment. 

The correlational nature of this study is 
both a strength and a weakness of the study. 
Because a naturalistic setting was used, the 
artificial nature of an experimental setting 
was avoided. Instead, the focus of study was 
on models who have had numerous interac- 
tions with the subjects and who have had 
(presumably) an actual impact on their at- 
titudes toward a science career. Hence, this 
study provided a test of the generalizability 
of same-sex and opposite-sex modeling ef- 
fects to a natural setting. 

On the other hand, the correlational na- 
ture of the study is also a weakness in that 
the findings cannot be interpreted as cause 
and effect relationships. For example, the 
sex-of-teacher-sex-of-student interaction 
may mean that same-sex teachers have a 
greater influence in promoting science career 
commitment, or it may mean that students 
who are highly committed to a science career 
are more likely to seek out same-sex models. 
It is difficult to choose between these alter- 
natives in a correlational study. However, 
the three-way interactions found in each 
analysis are useful in considering these al- 
ternatives. In regard to the teacher contact. 
variable, note that all students in the high- 
contact group had completed an original 
research project apart from regular class 
assignments. It seems reasonable to assume 
that all of these students would have a higher 
science career commitment than the other 
students. In fact, the means of the same- 
sex, high-contact groups do indicate a higher 
level of science career commitment; however, 
the means of opposite-sex high-contact 
groups are no higher than the means of other 

ps. This three-way interaction of sex of 
teacher, sex of student, and teacher contact 
suggests something more than the active 
selection of same-sex teacher models by 
highly committed students: _The nature of 
the personal relationship within same-sex 
teacher-student pairs appears to be more 
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effective in fostering or maintaining science 
career commitment. 

This reasoning can also be applied in the 
case of the attractiveness variable. Students 
who aspired to a science career indicated 
that they would like to become like their 
science teachers in at least some respects, 
and one is not surprised that those interested 
in science would wish to become at least 
somewhat like their science teachers. 
However, only those subjects with same-sex 
attractive teacher models had higher career 
commitment. As with the teacher contact 
interaction, this interaction also suggests 
that a same-sex model is more effective than 
an opposite-sex model in promoting or 
maintaining plans for a science career. 

Statistics were gathered on the sex ratios 

in the senior high school science depart- 
ments that participated in the science sym- 
posium described in the Procedure section. 
Across all schools, 23% of the science teach- 
ers were women; yet, among the students 
who submitted a research paper to the 
symposium, 44% were girls. Hence, the ratio 
of interested high school science students to 
same-sex science teachers was 3.3 for the 
male students and 7.4 for the female stu- 
dents. The results of the present study 
suggest that individual teacher contact with 
same-sex models is an important factor in 
fostering and/or maintaining science career 
commitment. Yet, in these schools, girls 
have less opportunity than boys for same-sex 
individual teacher contact. Consider also 
that the results of this study suggest that 
only same-sex teachers who are seen as at- 
tractive promote or maintain science career 
commitment. This is a particularly impor- 
tant issue for females because the total 
number of female science teachers is so 
small. What these findings mean is that the 
large numbers of high school girls who have 
an interest in science have less opportunity 
than their male counterparts to find a 
same-sex teacher whom they would like to 
copy. 

The obvious implication of these findings 
is that the scarcity of female science teachers 
in the high schools is a factor that discoura- 
ges girls from setting science career goals. 
This problem has been perpetual: Since 
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females are discouraged from science, they 
fail to become science models for other fe- 
males. The results of this study suggest that 
well-formulated affirmative action programs 
may be a way to change this nonparticipa- 
tion pattern of women: When a more equal 
proportion of female science teachers are 
available for interested high school girls, the 
science career commitment of these girls will 
be fostered. 
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Effect of Teacher, Student, and Class Char isti 
, A , t 
on the Evaluation of College s Me 


Patricia B. Elmore and John T. Pohlmann 
Southern Illinois University at Carbondale 


This study was conducted to determine if student evaluations o: 

affected by the characteristics of the teacher, the student, and don tee 
instructor was asked to answer questions indicating personal warmth, profes- 
sorial rank, years of teaching experience, sex, and class size. Students were 
asked to complete the Instructional Improvement Questionnaire (IIQ). The 
20 questions on the IIQ that directly evaluate instructor performance were an- 


suggestions. 


dents that expected high grades. 


_ The purpose of the present study was to 
investigate the effect of teacher, student, and 
class characteristics on student evaluations 
of teaching effectiveness. This study differs 
m previous studies in that the four types 
of variables are analyzed simultaneously. 
Instructor characteristics such as sex, ac- 
ademic rank, and warmth have been studied 
to determine their effect on student ratings 
of teaching effectiveness. Downie (1952) 
Teported that women faculty received sig- 
nificantly higher ratings than men faculty for 
the extent to which they brought new books 
and authors into the classroom. Elmore and 
ointe (1974) found that women faculty 
Teceived significantly higher ratings for 
promptly returned homework and tests," 
While men faculty received significantly 
higher ratings on "spoke understandably.” 
"Two studies (Aleamoni & Graham, 1974; 
Meamoni & Yimer, 1973) found no rela- 
tionship between student ratings an in- 
Structor rank, while Villano (Note 1) found 
at associate and full professors received 


higher ratings than instructors and assistant 
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alyzed. Only the results for the first set of canonical functions are presented. 
An instructor who received high scores on this canonical function would be 
rated as encouraging student participation in the course, showing an interest 
in students, knowing when students understood her or him, being available to 
students, increasing appreciation for the course, and accepting criticism and 
sugg The classes that received high values on this function were small 
in size, were taught by instructors who rated themselves as warm, and had stu- 


professors. Teacher warmth was found to 
be an important variable influencing student 
ratings in six studies (Baird, 1973; Costin & 
Grush, 1973; Elmore & LaPointe, 1975; 
Isaacson, McKeachie, & Milholland, 1963; 
McKeachie & Lin, 1971; McKeachie, Lin, & 
Mann, 1971). 

Results from studies examining the effect 
of student characteristics are as conflicting 
as the studies related to teacher character- 
isties. Two studies (Goodhartz, 1948; 
Isaacson et al., 1964) found no differences 
between faculty ratings made by male and 
female students. Bendig (1952) found that 
women students rated their instructors 
(men) significantly lower than the male 
students rated them, Elliott (1950) found 
that women students tended to give higher 
ratings in “presentation of the subject mat- 
ter” than male students, and Elmore and 
LaPointe (1974, 1975) found that female 
students rated instructors higher in “speci- 
fied objectives of the course.” ; 

The relationship between student ratings 
of instructors and expected grade in the 
course was found to be nonexistent in a study 
by Kennedy (1975) and positive in a study by 
Pohlmann (1975). Frey, Leonard, and 
Beatty (1975) found that the students’ grade 
point averages did not systematically vary 
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EVALUATION OF COLLEGE INSTRUCTORS 


with their ratings; however, more senior 
students rated instructors more favorably 
than their less experienced classmates. 
Centra and Linn (Note 2) reported that 
lowerclassmen (versus upperclassmen) and 
students with higher expected grades and 
high cumulative grade point averages tended 
"to rate the examinations, course quality, and 
‘the text higher. Lunney (Note 3) reported 
that freshmen tended to rate instructors 
lowest, while juniors and seniors rated them 
highest. 
A study by Christensen and Bourgeois 
(Note 4) revealed a linear trend toward more 
vorable evaluations as students spend 
ore time on the course outside of class. 
Two class characteristics variables, the 
s size and the required—elective status of 
e course, were included in the present 
tudy. Class size was found to have no re- 
tionship to student ratings by Aleamoni 
d Graham (1974) and Lunney (Note 3) 
and a positive relationship with student. 
Tatings by Villano (Note 1). Gage (1961), 
Lovell and Haner (1955), and Pohlmann 
d found that teachers of elective courses 
Teceived higher ratings than teachers of re- 
quired courses. 


| Method 


"The data for this study were obtained in conjunction 
pn the university-wide student-evaluation-of-in- 
‘Struction program at Southern Illinois University at 
Carbondale during the 1973-74 academic year. The 
Instructional Improvement Questionnaire (IIQ; Elmore 
&Pohlmann, 1975) is designed to collect data on stu- 
lent and class characteristics and on student evalua- 
ke of instructors and courses. In addition to the IIQ, 
"hormation for this study was obtained using a faculty 
"Information form. 
1 n four types of variables analyzed were teacher 
aracteristics, student characteristics, class charac- 
tics, and student ratings of instructors. The in- 
Ti ctor rating items from the IIQ appear in Table 1. 
he student and class characteristics variables 
Were level of course (1 = freshman, 2 = sophomore, and 
Een, GPA (mean grade point average of students 
m olled in the course), outside study hours (mean 
Erbe of hours per week reported in study activity 
X ated to the course), general rating (average rating of 
ie general quality of instruction at Southern jinois 
University at Carbondale), expected grade (average 
a expected by students in the class), year in $ ool 
(etage year. in school of students in the class), sex 
Tcentage of students in the class that were female), 
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elective (percentage of students in the class taking the 
course as an elective), and size (number of students 
enrolled in the course). 

A faculty information form was used to obtain the 
teacher characteristics variables analyzed. They were 
sex (0 = male and 1 = female), rank (1 = lecturer, 2 = 
instructor, 3 = assistant professor, 4 = associate pro- 
fessor, and 5 = professor), years of teaching experience 
(1 = 1 year or less, 2 = 2 to 5 years, 3 = 6 to 10 years, 4 
= 11 to 15 years, and 5 = 16 years or more), self-rating 
of personal warmth (1 = very, 2 = above average, 3 = 
moderately, 4 = somewhat, and 5 = not at all), and 
self-rating of primary teaching interest (1 = the student 
and 0 = course content). 

The unit of analysis for this study was the class. The 
sample of classes included courses at all levels from all 
colleges within the university taught by instructors at 
each of the teaching ranks, with a range of class sizes 
from small (less than 10 students) to large (more than 
100 students). The number of classes used in the 
analysis was 174. 


Results 


Table 1 contains the correlations between 
the mean ratings on the 20 IIQ instructor 
evaluation items and the teacher-student— 
class characteristics variables. The 
teacher-student-class characteristics vari- 
ables do not correlate highly with student 
ratings of instruction. The great majority 
of the correlations are low and not signifi- 
cantly different from zero (a = .01). This 
indicates that a rather large portion of the 
variance in student evaluations of instructors 
is attributable to sources other than those 
examined here. Eun 

Since the Pearson correlation is not sen- 
sitive to curvilinear relationships between 
variables, a number of one-way analyses of 
variance were calculated to determine eta 
square values. The eta square statistics, 
which assess the maximum degree of non- 
linear relationship, were compared to the r 
values derived from Table 1. This analysis 
indicated that the relationships reported 
could be assumed linear with no loss of 
meaning or changes in the interpretation. 

Even given the generally small magnitude 
of the correlations in Table 1, a pattern of 
relationships emerged. The most potent 
student characteristics variable was the 
grade expected by students ina class, 
suggesting that the grading leniency of the 
instructor is a potent factor in student 
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evaluations. The general rating of instruc- 
tion at Southern Illinois University at Car- 
bondale was the next most important vari- 
able in terms of its relationship to student 
ratings. This finding hints at the presence 
of a leniency factor in student ratings, since 
all students in all classes were rating the 
same university. 

The items from the IIQ also varied in the 
relations to the teacher-student-class 
characteristics variables examined. Items 
reflecting a student orientation factor (Items 
16 and 19; see Table 1) tended to correlate 
highest with the teacher-student-class 
characteristics variables. 

Canonical analysis was selected for this 
study because of its appropriateness in set- 
tings where the researcher wishes to examine 
the relationship between two sets of vari- 
ables. Canonical analysis approaches this 
problem by solving for two sets of weighting 
coefficients, which when applied to each set 
of variables will form composite variables 
that maximally correlate. Canonical anal- 
ysis can solve for multiple orthogonal sets of 
weighting coefficients, each set indicative of 
an independent pattern of relationships 
na the variable sets (Cooley & Lohnes, 

71). 

The results of the canonical correlation 
analysis that related the teacher-student- 
class characteristics variables to the 20 IQ 
rating items appear in Table2. A complete 
presentation of the canonical correlation 
analysis is prohibitive, so only the results for 
the first set of canonical functions are pre- 
sented. 

The largest canonical correlation, the one 
reported in Table 2, was .83 and was signifi- 
cantly greater than zero (a = .01). The fig- 
ures presented in Table 2 are the correla- 
tions, or loadings, of the original variables 
with the canonical functions. These load- 
ings amplified the relationships that were 
cursorily noted earlier by inspection of the 
R matrix in Table 1. 


Discussion 


The following three teacher-student-class 
characteristics variables were found to be 
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Table 2 
Loadings of Original Variables on the First 
Pair of Canonical Variables 


Instructional 
Improvement 
Questionnaire Teacher-student-class 
items characteristics variables 
Item 
num- Load- Load- 
ber® EE Variable ing 
1 -19 |Course level E: 
2 19 |M class GPA> ES 
3 13 |M study hours per week 04 
4 44 |General rating of instruction 
at SIU-C* n 
5 -65 |M expected grade in class 81 
6 47 |M student year in school 46 
7 -43 |% females in class 22 
8 .45 |96 taking course as elective 18 
9 -30 | Class size —.56 
10 .51 |Instructor's years of teaching 
experience 05 
11 -56 |Instructor's self-rating of 
personal warmth 55 
12 -23 | Instructor’s rank? — 15 
13 -33 |Sex of instructor* D 
14 -44 |Instructor's self-rating of 
15 15 interest! 
16 -75 
17 21 
18 57 
19 Ni 
20 48 ns 


Note. Canonical correlation coefficient = .83; Bartlett's chi- 
square = 178.4 (df = 33, p <.01). i 

* See Table 1 for the item statements corresponding to the item 
numbers. 

b GPA = grade point average. 

* SIU-C = Southern Illinois University at Carbondale. 

41 = lecturer and 5 = professor. 

*0 = male and 1 = female. 

£0 = content of course and 1 = students. 


important factors in student ratings of 
teacher effectiveness: 

Expected grade in the course. The 
findings of the present study were consistent 
with results reported by Pohlmann (1975); 
Centra and Linn (Note 2), and Christensen 
and Bourgeois (Note 4), which indicate 2 | 
Positive relationship between student ratings 
of instructors and expected grade in the 
course. : 

Class size. Small classes received highet 
ratings than large classes in this study. 

Teacher warmth. Consistent with Pr 
vious findings (Baird, 1973; Costin & Grus ; 
1973; Elmore & LaPointe, 1975; Isaacson € 
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i 

al., 1963; McKeachie & Lin, 1971; McKea- 

chie et al., 1971), teacher warmth was an 
important variable influencing student rat- 
ings of teacher effectiveness. 

The IIQ items that were most predictable 
from the teacher-student characteristics 

"variables were (a) “encouraged student 
participation,” (b) “showed an interest in 
students,” (c) “knew if students understood 
her or him,” (d) “was available outside of 
class,” (e) *increased your appreciation for 

_ the subject,” and (f) “accepted criticism and 
suggestions.” These items measure the 
degree of teacher-student interaction or the 
degree of student orientation exhibited by 
an instructor. 

In general, it was found that warm in- 
structors teaching small classes with stu- 
dents that expected high grades received 
higher teacher effectiveness ratings on items 

“measuring the degree of the instructor’s 
“orientation toward students than on items 
| measuring other aspects of teaching effec- 
— tiveness, such as course difficulty and pre- 
sentation of material. "These results seem to 
offer some information concerning the 
discriminant validity of student ratings of 
| teacher effectiveness. 

Further research is needed to determine 
the variables that affect student ratings of 
college instructors. The teacher-student- 

class characteristics variables included in 
this study do not correlate highly with stu- 
dent ratings; therefore, a large portion ofthe 

Variance in student evaluations of college 
instructors is attributable to sources other 

— than those examined here. 
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Cognitive Style in Two Modalities: Vision and Audition 


Cynthia Bellows Kennedy and Eliot J. Butter 
University of Dayton 


Eighty-one fourth-grade children were individually administered the vi 
Matching Familiar Figures Sequential Presentation Task Mine Pe 
Auditory Impulsivity Task (AIT), two match-to-sample tasks designed to 
measure cognitive style. A moderate negative correlation was found between 
errors and latencies on the AIT, thus indicating that longer latency did not al- 
ways result in better performance. A high negative correlation was found on 
the MFF-SPT. Fifty-five percent of the children maintained their classifica- 
tion as reflective, impulsive, fast-accurate, or slow-inaccurate across the two 
modalities, providing evidence that the two tasks were measuring somewhat, 
different abilities. Children employed the same search strategy in both 
modalities. The suggestion was made that auditory cognitive style be investi- 
gated for relationships with reading ability. 


_ Many researchers have attempted to de- 
fine and explain the relationship between 
auditory perceptual and/or discrimination 
skills and reading. Equivocal results have 
been reported in those studies employing 
correlational analyses of auditory tasks and 
reading measures. Some authors report 
high associations between the two (Bruin- 
inks, 1969; Rosner, 1973; Wepman, 1960, 
1975), with the correlation between auditory 
measures and reading often exceeding that 
between IQ and reading (Harrington & 
Durrell, 1955; Lingren, 1969; Machowsky & 
Meyers, 1975). However, others have re- 
ported low correlations between reading and 
audition (Dykstra, 1966; Hammill & Larsen, 
1974; Morency, 1968). A possible explana- 
tion for this equivocality of results could be 
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that studies reporting strong relationships 
generally employed auditory perceptual or 
discrimination tasks involving some re- 
sponse uncertainty (e.g., the Wepman Au- 
ditory Discrimination Test [1958], the Kin- 
dergarten Auditory Screening Test [Mar- 
golis, 1977], and the Auditory Analysis Test 
[Rosner, 1973]), whereas reports failing to 
find shared variance have assessed auditory 
skills with tasks of relatively low response 
uncertainty (e.g., Digit Span). Perhaps it is 
not only auditory discrimination/perceptual 
ability that is sharing variance with reading 
but rather a combination of auditory skills 
and cognitive strategy in approaching 
problems of response uncertainty. 

Research from another vein has shown 
some, although guarded, association between 
cognitive strategy and reading. Numerous 
studies employing the Matching Familiar 
Figures test (MFF; Kagan et al., 1964), a vi- 
sual match-to-sample task of high response 
uncertainty, have shown strong relationships 
between cognitive style and reading behav- 
ior. Reflective children (responding with 
long latencies and few errors on the MFF) 
scored significantly better than impulsives 
(responding quickly and with many errors) 
in the reading of English words presented 
singly or in prose (Kagan, 1965); in reading 
comprehension (Lesiak, 1971); on oral 
reading performance (Butler, 1973); and in 
first-grade reading readiness (Kalash, 1973; 
Shapiro, 1976). However, the relationship 
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to reading is not totally clear here either. 
Other researchers have found no such dif- 
ferences between cognitive style groups on 
reading measures (Denney, 1974; Hood & 
Kendall, 1975; Margolis, 1976). Therefore, 
there seems to be something lacking in a test 
of visual cognitive style, so that the effect is 
inconsistent across samples. 

Mussen, Conger, and Kagan (1974) argue 

that in learning to read, the child proceeds 
through a series of phases. First, the child 
must be able to “identify the letters that 
compose the words. This process involves 
distinguishing the distinctive patterns of 
lines that define each of our alphabetic let- 
ters” (Mussen et al., 1974, p. 290). Sunshine 
and DiVesta (1976), in addressing this issue, 
found that first-grade impulsives responded 
more quickly and with more errors on a letter 
discrimination task than did children clas- 
sified as reflective. The reader must be 
cautioned however, as Sunshine and DiVesta 
classified impulsives as children who made 
more than the median amount of errors 
without regard for their latencies. Second, 
and of equal importance, Mussen et al. 
(1974) argue that the child must learn the 
names and sounds of the letters. Finally, 
the child must have a well-enough defined 
auditory short-term memory system, so that 
the sounds of each of the letters comprising 
a word will be remembered until all of the 
letters are identified and integrated and the 
word formed. Mussen et al. (1974) state 
that “children who have not matured to the 
point where they are able to hold one or two 
items of information in memory while they 
are working on another item and to integrate 
the previously stored information with the 
freshly perceived information will have dif- 
ficulty learning to read" (p. 291). The im- 
portance of being able to integrate each 
component of the word (each letter) with the 
rest should not be mitigated, for it is this 
integration that completes the process and 
produces the word. 

Visual tasks of response uncertainty (e.g., 
the MFF) definitely require the child to 
discriminate well and approach the problem 
carefully. Although this is a necessary 
condition of good reading, it is not sufficient; 
other abilities are also required. Therefore, 
it is possible for a child to perform well on 
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visual cognitive style tests and yet rea 
poorly. On the other hand, a task that ta 
these abilities (good discrimination and 
careful approach to the problem), as well as 
those other abilities necessary for reading, 
should be able to discriminate between good 
and poor readers and even predict potential 
reading ability in prereaders. The other 
abilities that need tapping are auditory 
short-term memory, auditory integration 
powers, and encoding and decoding abili- 
ties. 

The authors have developed a task that 
they believe meets these qualifications. The 
Auditory Impulsivity Task (AIT) is a 
match-to-sample task similar to the MFF 
and, therefore, involves high response un- 
certainty, careful discrimination, and scan- 
ning strategies. However, the AIT is in the 
auditory modality. A sequence of beeps 
with temporal pauses are played (standard), 
and the child must find the matching audi- 
tory sequence from among four variants. To 
perform well on the task, the child must be 
able to remember the entire sequence of 
beeps and pauses, must be able to integrate 
them into some coded pattern so that the 
exact match can be found (some of the in- 
correct variants have identical patterns to 
the standard for the first half of the se- 
quence), and must be able to reproduce the 
coded pattern for comparison (similar to 
remembering, integrating, and reproducing 
the sounds for D, 0, and G to sound out 
DOG). 

The purpose of this study was to test the 
AIT and determine whether there was any 
similarity of performance across the AIT and 
the MFF. It was expected that there would 
exist some relationship in performance be- 
tween the two tasks because of the shared 
abilities required to perform well on both. 
However, it was expected that the relation- 
ship would not be unity because the AIT 
assesses capabilities not required for success 
on the MFF. Actual scanning strategies 
employed by the children in searching for 
the match to the standard were also looked 
at across modalities. It was expected that 
strategies would not change across modality 
because it is believed that cognitive style i$ 
a basic problem-solving strategy employe 
on all tasks of response uncertainty. Similat 
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strategies would lead to differential perfor- 
mance, however, because of the different 
abilities being tapped in the two modalities. 
The search strategy analysis was included 
mainly as a control feature, so that if differ- 
ences in performance were found, they could 
be attributed to differences in the way the 
child handled the information received 
rather than to what information was actually 
perceived. 


Method 
Subjects 


Eighty-one fourth-grade children (45 males and 36 
females) were included in the final sample after 7 chil- 
dren were excluded because of incomplete data. 
Subjects were drawn from two southwestern Ohio public 
schools of comparable lower middle class status. Mean 
age of the final sample was 10 years 1 month (range 9 
years 1 month to 10 years 8 months). 


Tests and Apparatus 


After the Wepman Auditory Discrimination Test 
(Wepman, 1958) was administered to screen any child 
displaying auditory deficiency, the Matching Familiar 
Figures Sequential Presentation Task (MFF-SPT) and 
the Auditory Impulsivity Task were presented. 

Matching Familiar Figures Sequential Presentation 
Task. The MFF-SPT was a visual match-to-sample 
task similar to that devised by Kagan, Rosman, Day, 
Albert, and Phillips (1964), except that the seven pic- 
tures (standard and six alternatives) were not visible 
simultaneously. Rather, only one picture could be 
viewed at a time, thus introducing a short-term memory 
component. Siegelman (1969) reported that the im- 
position of a memory requirement on à similar task had 
no appreciable effect on performance. The pictures 
were the line drawings of the MFF. The child was 
presented with a 76.2 X 53.3 cm stimulus panel con- 
sisting of seven doors that opened left to right. The 
doors were arranged with the standard centered above 
two rows of three alternatives. Subjects were told that 
pictures were behind the doors and could be seen by 
Sliding open one door at a time. Pictures could be 
looked at an unlimited number of times and in any 
order, with the task being to match the standard picture. 
Microswitches behind the panel recorded the opening 
and closing of each door on an Esterline-Angus 8-pen 
Minigraph Recorder. After 2 practice trials, 12 test 
trials were given, with a maximum of 6 errors per trial 
allowed before the child was shown the correct 


wn for this stud; 
Auditory Impulsivity Task. Devised for this study, 
the AIT was a match-to-sample task in the auditory 
modality patterned after the MFF-SPT. Subjects wei 
Presented with a 28-cm-square sloping panel containing 


five buttons: one at the top (standard) and two rows 
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of two buttons each (alternatives) positioned below. 
When a button was pressed, a sequence of tones was 
played through earphones. The subject had to find the 
one button that played the exact tonal sequence as the 
standard. Again, the child was allowed to push the 
buttons as often as desired and in any order but had to 
listen to an entire sequence before pressing another 
button. Microswitches behind each button recorded 
the search patterns on an Fsterline-Angus Mini-Event 
Recorder. The AIT consisted of 2 practice and 10 test 
trials, with four errors allowed per trial before the cor- 
rect button was indicated. 

Each button played a tonal sequence of four to seven 
15-sec 400-Hz tones separated by one to three pauses, 
The total time of a sequence ranged between 21) sec and 
5 sec. Trials were randomly ordered in terms of diffi- 
culty, with the same order used for each subject. The 
correct alternative was varied, so that it occupied each 
position three times across practice and test trials. 

The AIT was not identical to the MFF-SPT in that 
only 4 alternatives were presented rather than 6, and 
only 10 test trials were given rather than 12. This re- 
sulted from pilot testing that showed that children be- 
came tired and fatigue became a more compelling 
variable than strategy. 


Procedure 


"The study was conducted in two sessions. Forty-one 
children were individually administered the Wepman 
Auditory Discrimination Test followed immediately by 
the MFF-SPT during the first session, These children 
received the AIT alone during the second session, The 
MFF-SPT and AIT were given in reverse order for the 
remaining 40 children. Fach session averaged about 
25-30 minutes, and all children were seen for the second 
time within 10 days of their first session. 


Results 


A canonical correlation was performed 
between error and latency scores on the 
MFF-SPT and AIT to assess the relation- 
ship between overall performance in the vi- 
sual and auditory modalities. The canonical 
correlation of .42 was found to be significant 
by the chi-square test, x2(4) = 16.30, p « 01. 
Thus, performance on the two tasks was 
similar with respect to speed and accuracy 
combined. ^ 

Pearson product-moment correlations 
were performed between error and latency 
measures separately within each task. The 
MFF-SPT correlation was evaluated to in- 
dicate (a) that the sample employed was 
similar to previous samples used in other 
investigations and (b) that the slight modi- 
fication imposed on the original MFF did not 
influence performance on the MFF-SPT. 
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Table 1 

Intercorrelations Between Error and Latency 
Measures on the Matching Familiar Figures 
Sequential Presentation Task (MFF-SPT) 


and the Auditory Impulsivity Task (AIT) 


AIT 
Measure Error Latency 
MFF-SPT 
Error .42** —43** 
Latency -.33* AT** 
*p<.05. 
**p<o0l. 


As expected, a high negative correlation was 
found between MFF-SPT error and latency 
measures (r = —.68, p < .01). Further, the 
correlation was within the upper range re- 
ported by previous studies employing the 
MFF. 

Because the AIT was also designed as a 
match-to-sample task of high response un- 
certainty, a negative correlation between 
errors and latency was expected. However, 
because other capabilities not previously 
assessed were required, the degree of the 
association between errors and latency could 
not be predicted. A moderate but signifi- 
cant correlation of —.32 (p < .05) was found. 
Table 1 presents the intercorrelations be- 
tween error and latency measures across the 
two tasks. 

. The standard procedure for classifying 
subjects into cognitive style groups was ap- 
plied to both the MFF-SPT and AIT. The 
MFF-SPT median split classification (8 total 
errors and 34-sec mean latency) resulted in 
35 reflectives and 31 impulsives; AIT median 
splits (5 total errors and 50-sec mean laten- 
cy) produced 27 reflectives and 28 imp- 
ulsives. Remaining subjects were classified 
in the extreme groups: fast-accurate and 
slow-inaccurate. Forty-five of the 81 chil- 
dren (5596) retained their classification 
across the two modalities. However, the 
remaining 4596 switched styles across the two 
tasks. Of these 36 children, 25 shifted across 
only one median, and 11 subjects (less than 
14% of the original sample) crossed both 
medians. Table 2 presents the classification 
data for the two tasks. 

Esterline-Angus recordings on both tasks 
were reduced to five resultant scanning 
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strategy measures: (a) Os, the number: 
observations made to the standard; (b) 0 
the number of observations made to the a 
ternatives not chosen; (c) O., the number. 
observations made to the alternative chose 
(d) Ao, the number of different alternativ 
observed; and (e) Or, the number of obse 
vations made to the most frequently ol 
served alternative. These five measure 
were included simply to determine whethe 
similar search strategies were employe 
across modality. If the strategies wer 
similar, crossovers in classification and dii 
ferences in within-task correlations could b 
attributed to how the child processed th 
information received rather than to th 
scanning pattern used. 

Difference scores were computed acros 
the two modalities between these measure 
and tested against zero. Prior to the anal 
ysis, transformations were applied to all dat: 
to equate the MFF-SPT and AIT because o 
the differing number of alternatives ani 
trials across the tasks. The total number o 
observations of each search strategy measur 
was divided by the number of trials and al 
ternatives on the respective test. Initially 
Scores were tested for all children in the 
sample. A high number of observation: 
were noted for all measures across both 
tasks. Only Oz displayed differences across 
modality (p <.01). More observations of 0; 
were made on the MFF-SPT. Because 
many investigations deal only with the two 
major cognitive style groups, reflectives and 
impulsives, Hotelling's trace criteria were 
computed on the difference scores for re- 


Table 2 


Classification Data for the Two Tasks 3 


Classification AIT 
Style R I FA SI Total 
MFF-SPT 
R d Ee; 4 35 
I 8739: 6 3 31 
FA 2» T9 1 3 8 
SI 1 2 0 4 7 
Total 27 -28 12 14 81 
Note. Entries are the number of children classified. AIT = 


Auditory Impulsivity Task and MFF-SPT = Matching Familiar 
Figures Sequential Presentation Task. R = reflectives, I-im 
pulsives, FA = fast-accurates, and SI = slow-inaccurates. 


m. 


ives and impulsives separately. No 
ences were found between search 
rategies employed on the MFF-SPT and 
IT for either group (all Fs < 1). 


Discussion 


significant negative correlation was 
{und between AIT error and latency scores. 

owever, the correlation was not as high as 
dipically reported for the visual modality. 
though the AIT and MFF-SPT are both 
Jatch-to-sample tasks of high response 
dacertainty, the difference in correlational 
flues could indicate the two tasks are tap- 
ding somewhat different abilities. On the 
FF-SPT, longer latencies lead to, or at 
t are associated with, fewer errors. On 
e AIT, however, increased latency is only 
ightly beneficial in committing less errors. 
he authors believe this moderate correla- 
n on the AIT is due mainly to the in- 
ased number of subjects classified in the 
0 extreme groups, that is, fast-accurates 
ind slow-inaccurates. The results indicate 
liat about 14% more subjects were classified 

i the extreme groups on the AIT than on 
lhe MFF. Although in the past, most liter- 


ample (see Messer, 1976, for a review). 
The authors contend that the fast-accu- 
(and some of the reflectives) are capa- 


Equire less observations and comparisons 
stimuli riot identical to the standard. The 


vations were made of Oz (not chosen al- 
lernatives) on the AIT than on the MFF- 
SPT, whereas this difference was not found 
When only the major groups of reflectives 
ind impulsives were examined, partially 
Onfirms this explanation. If, on the AIT, 
hildren integrated the beeps and pauses 
nto a coded pattern, there was no need to 
heck an incorrect alternative many times 
cause discrepancies would become obvious 
With few observations. However, the task 
Yas not without high response uncertainty 


--—— ur 
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nding that in the entire sample, less ob- . 
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as indicated by the number of observations 
to the standard and other alternatives and 
the number of total errors committed. 
Slow-inaccurates (and possibly some im- 
pulsives), on the other hand, may not have 
been able to integrate the stimulus compo- 
nents and thus placed great demands on 
their auditory memory, trying to retain the 
temporal number and spacing of each ele- 
ment. The AIT requires that short-term 
memory be sufficient to retain a complicated 
series of stimuli across many seconds. (The 
MFF-SPT also requires short-term memory, 
but this component necessitates remem- 
bering only a single element [e.g., a square 
versus round chimney] for a much shorter 
period of time.) For those children who 
could not integrate the stimulus elements 
even with several observations of the stan- 
dard and alternatives and who thus had 
longer latencies, a correct match could not 
be found on the AIT. À 
Although there were differences in Oz 
when fast accurates and slow inaccurates 
were included in the analyses, other mea- 
sures of scanning strategy did not differen- 
tiate between modality, thus indicating that 
across the two tasks, children displayed 
similar carefulness and attention to the 
problems. This similarity in approach 
supports the notion that. reflection-im- 
pulsivity (or approach to tasks of high re- 
sponse uncertainty). is of a more general, 
global, cognitive nature and not subservient 
to individual modalities. Therefore, it was 
the child’s plan of WR Viri was m 
ntal im, differential performan 
on tell es. Instead, it was the pres- 
ence or absence of capabilities tested by each 
task that accounted for changes in the cor- 
relation values and, in some cases, classifi- 


i le. 
Wm e udy indicates that the AIT 


The present st icat 
might im some of the abilities Mussen et. al. 
(1974) define as requirements of good read- 


i reful scanning and encoding, inte- 
m. of individual elements, and so on). 
In view of this and the literature relating 
auditory discrimination to reading ability 
and also cognitive style to reading, the au- 
thors feel it would be reasonable to admin- 
ister the AIT and MFF-SPT to children 
along with assessments ofreading. Itisex- 


not 
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pected that the AIT will relate more to 
reading ability than will the visual test of 
cognitive style. The authors are currently 
involved in analyzing the results of such an 
investigation. Although all analyses are not 
completed, preliminary findings indicate 
that AIT error and latency scores, entered 
into a multiple regression analysis, account 
for significantly more of the variance in a 
standard reading test than when the same 
measures from the MFF-SPT are em- 
ployed. 

The importance of determining the sal- 
ience of the AIT to reading is two-fold. 
First, it is possible that some readers may 
experience problems not because of lack of 
vocabulary or understanding of syntax rules, 
but rather because they lack the basic 
capabilities of good reading (careful dis- 
crimination, encoding, decoding, and inte- 
gration). The AIT may aid in identifying 
children with these specific reading prob- 
lems. Second, if the AIT can be employed 
with prereaders, it may eventually be used 
to screen potential problem readers, that is, 
children who would enter reading instruction 
without having mastered those basic abilities 
needed to succeed. Perhaps, then, those 
children could be specifically trained in the 
components of reading ability that they lack, 
aa thus many problems could be preclud- 

These high aspirations for the usefulness 
of the AIT in the future should be held with 
reservation because much research needs to 
be conducted before such expectations can 
become realities. However, the present 
study, the preliminary report from the au- 
thor’s laboratory, and past literature all seem 
to warrant further attention to this possi- 
bility. 
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Evaluation of the Effects of Feedback Associated with a 
Problem-Solving Approach to Instruction on Teacher and 
Student Behavior 
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Two experiments were completed to determine if feedback related to prob- 
lem-solving competencies would result in changes in both teacher and student 
behavior. The results of Experiment 1 indicated that experimental teachers 
(n = 19) were more effective than control teachers (n = 17) in (a) controlling 
their own behavior and (b) reducing learner inattention. The results of Ex- 
periment 2 indicated that experimental teachers were more effective than con- 
trol teachers as measured by their abilities (a) to attempt to diagnose more 
often, (b) to employ appropriate diagnostic procedures, and (c) to assist stu- 
dents ultimately in mastering previously missed questions. Thus, both exper- 
iments demonstrate that predicted changes in teacher and student behavior 
occur if teachers receive feedback associated with a problem-solving approach 


to instruction. 


Various attempts to specify teacher com- 
petencies as a basis for teacher education 
programs have led to the development of 
numerous lists of such competencies (Gen- 
eric Teaching Competencies, 1974; Weber 
College, 1974). Rosenshine (1974), in his 
listing of variables thought to reflect teacher 
competency, indicates that these com- 
petencies are promising areas for research 
but states, “They cannot be used as 
checkpoints to assess teacher competency 
because the research to date is too incom- 
plete” (p. 139). A possible reason for the 
incompleteness of the research is that the 
competencies listed may not be appropriate 
for all teaching situations. Thus, compari- 
sons of the effectiveness of teachers, when 
the comparisons are based on teachers 
demonstrating a particular competency, may 
lead to equivocal results if the instructional 
conditions surrounding the problems should 
change. To reduce the probability of the 
occurrence of such equivocal findings, an 
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attempt must be made to identify com- 
petencies that are generalizable to most in- 
structional conditions that involve differ- 
ences in learner problems. f 
One possible way to specify competencies 
that have a high degree of generalizability is 
to use a problem-solving approach. Such an | 
approach, according to Dewey (1933), re- 
quires defining the problem, observing an! 
collecting data, formulating a hypothesis, | 
testing the hypothesis, and drawing and 
applying a conclusion. Because identifying 
the problem, observing and collecting data, 
and generating and testing hypotheses are | 
not peculiar in terms of their applicability to 
a given learning situation, a problem-solving 
approach would appear to be generalizable 
to most instructional situations. 
When teachers are taught to use a prob- 
lem-solving approach, they must be given 
feedback to help them evaluate their teach- 
ing effectiveness in terms of their utilization 
of the various elements of the approach. 
Thus, teachers who have difficulty in € 
quiring a problem-solving approach show 
be given feedback that will allow them t° 
determine whether their inability to use ? 
problem-solving approach effectively 5 k 
function of (a) their failure to use feedbat 
(b) their inability to generate hypotheses, 
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their inability to make valid observations, or 
(d) their inability to relate knowledge to 
explanation. 

While problem-solving competencies are 
probably necessary conditions for effective 
teaching, they may not be sufficient. There 
may be situations where a teacher may de- 
velop the ability to utilize a problem-solving 
approach in case study situations but may 
not be able to do so in a live classroom set- 
ting. Careful consideration must always be 
given to factors that limit the teacher’s 
ability to make valid observations. For in- 
stance, Good and Brophy (1973) refer to 
“teacher awareness,” that is, the teacher’s 
consciousness of what he does in the class- 
room; Moore, Schaut, & Fritzges (Note 1), 
alluding to the same concept, discuss the 
importance of “control of one's teaching 
behavior." Teacher “awareness” or “con- 
trol” is important because it probably es- 
tablishes the upper limit on a teacher’s 
ability to observe as a basis for hypothesis 
generation and testing. For example, a 
teacher who simply responds reflexively to 
a learner behavior, as opposed to a teacher 
who does not simply respond reflexively but 
deliberately processes the observation, is not 
likely to be able to generate hypotheses ap- 
propriate for modifying the learner’s be- 
havior based on the observation. 

Evidence from studies conducted by Borg, 
Kelley, Langer, & Gail (1970), Emmer 
(1967), Brophy and Good (1970), and others 
indicates that teachers are often unaware of, 
or misinterpret, their own teaching behavior. 
Thus, if teachers are to develop the ability to 
Observe as a basis for hypothesis generation 
and testing, conditions must be provided to 
enable teachers to acquire awareness Or 
Control of these teaching behaviors. 

In an effort to demonstrate the effective- 
ness of training teachers to use a problem- 
Solving approach where attention was given 
to providing appropriate feedback and to 
creasing the probability that teachers 
Would be able to control their own teaching 

ehavior, Moore et al. (Note 1) completed a 
Series of experiments with in-service teach; 
ers. In addition to control, they suggested 

hat teachers wishing to acquire a problem- 
Solving approach to instruction must develop 
€ more specific competencies involving 
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hypothesis generation and testing associated 
with both (a) the reduction of learner inat- 
tention and (b) the development of instruc- 
tional sequences appropriate for the needs 
of individuallearners. These behaviors take 
on particular importance if one accepts the 
assumption that learning will occur if a 
learner attends to the learning task and if the 
task is appropriately sequenced for the in- 
dividual learner. While some research re- 
lating learner attention to learner perfor- 
mance has not resulted in particularly high 
correlations (Cobb, 1972; Lahaderne, 1966; 
Morsh, Burgess, & Smith, Note 2), most ex- 
perts agree that learner attention to the 
learning task is an important mediating 
variable in learning. Good and Brophy 
(1970) made this point when they said, “One 
fundamental fact we know is that one must 
attend to and think about most learning 
tasks if he is to master them" (p. 298). The 
importance of appropriate instructional se- 
quencing to meet the needs of individual 
learners who differ in motivational and in- 
tellectual characteristics has been docu- 
mented in a number of research studies 
(Brown, 1970; Moore, Smith, & Teevan, 
1968; Niedermeyer, Brown, & Sulzen, 
1969). 

A fourth competency suggested by Moore 
et al. (Note 1) was the ability to generate and 
test hypotheses associated with maximizing 
the effectiveness of instructional systems for 
the largest number of students. They jus- 
tified the importance of this competence 
with the assumption that most teachers will 
be responsible for instructing more than one 
student during the instructional period. 
Thus, a teacher must utilize a problem- 
solving approach in order to make efficient 
use of the various available modes of in- 
struction (including the teacher) to maxi- 
mize effectiveness in purposefully inter- 
vening with the largest number of stu- 
ig anam experiment conducted by Moore et 
al. (Note 1), data were collected associated 
with the competencies involving controlling 
one's teaching behavior (Competency 1) and 
generating and testing hypotheses regarding 
learner attending behavior (Competency 2). 
Using volunteer in-service public school 
teachers (N = 56) who were randomly as- 
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signed to experimental and control condi- 
tions, the experimenters provided in-class 
feedback for the experimental group with 
respect to these competencies for a period of 
6 to 8 weeks. Observation, as a basis for 
providing feedback, was completed by 
project staff using an instrument designed 
by the principal investigator. To evaluate 
the project, trained evaluators were used 
who had no knowledge concerning which 
teachers were assigned to experimental or 
control conditions. Comparisons were made 
between (a) observed instances of learner 
inattention as a measure of the effectiveness 
of feedback to the teacher associated with 
Competency 2 and (b) frequency of teacher 
intervention with students classified as high 
need, that is, students rated by their teacher 
as high in need of teacher intervention for 
learning to occur as a measure of Compe- 
tency 1. Reduction of learner inattention 
was used as a measure of a teacher’s effec- 
tiveness in relation to Competency 2, since 
a learner problem must exist, for example, 
inattention, in order for one to observe 
whether a teacher has the ability to solve the 
problem. If no problem of inattention is 
observed, there is no way of knowing 
whether learner attention is a function of 
excellence in teaching or whether it can be 
attributed to motivational variables other 
than the teacher. 
Moore et al.’s (Note 1) results indicated 
significant differences in student perfor- 
mance between experimental and control 
female teachers for the measure of Compe- 
tency 2 and a significant difference for both 
male and female teachers for the measure of 
Competency 1. In all cases where significant 
differences were observed, they favored the 
experimental conditions. One explanation 
given for the failure to obtain differences in 
student performance for the experimental 
male teachers in the evaluation of Compe- 
tency 2 was the small size of the sample of 
male teachers: There were seven in the ex- 
perimental group and three in the control 
group. 

While these data provide strong support 
for the effectiveness of the feedback pro- 
vided to the teachers, a number of questions 
remained unanswered. First, if the sample 
of males had been larger in the initial ex- 
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periment, would significant differences it 
learner inattention have resulted? Seco: 
could the positive results of this investigat 
be replicated with a different populati 
And third, could the major hypothesis—t 
teachers who receive feedback associ 
with the acquisition of a problem-sol 
approach to instruction are more likely t 
bring about predicted changes in their be: 
havior and ultimately in that of their stu- 
dents—be confirmed with regard to Com 
petency 3? The purpose of the pi 
study was to investigate these question! 
under controlled experimental conditi 
Specifically, two experiments were C0) 
pleted. The first was designed to replicat 
the experiment of Moore et al. (Note 
under conditions that provided a large 
number of male teachers. The second ei 
periment was designed to test the resea 
hypothesis under conditions in whic 
teachers received feedback with respect 
their ability to generate and test hypoth 
associated with Competency 3—“con' 
organization (instructional sequencing) à 
propriate to the instructional needs of th 
individual learner. 


Experiment 1 


Method 


Subjects. Thirty-six volunteer, nonperma 
certified teachers participated in Experiment 1, 
19 in the experimental group and 17 in the col 
group. Teachers in the experimental and contr 
groups were matched on the basis of sex, level taug 
(i.e., elementary or secondary), and the number 
of teaching experience. F 
Procedure, The treatment condition consis! 
providing the teachers with feedback associate 
controlling their own teaching behavior and wi 
erating and testing hypotheses associated with redi 
learner nonattending behavior. Experimental te 
met one evening per week for a 6-week period to acqui 
the theoretical knowledge necessary to utilize the i 
class feedback. In addition, every experimental te 
participated in one simulated teaching exper 
(Moore, Gagné, & Biddle, 1973) in order that the € 
ceptualizations would be demonstrated in an appe 
situation. Teachers also received feedback regan” 
their success in applying the acquired concept 
tions in their own classrooms during two eae 
each week for the 6-week period. 

Using the Moore teacher observation system 
et al., Note 1) to collect data, an experimenter 
the use of the observation system provided fi 
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teachers individually concerning control of their in- 
structional behavior. Specifically, the experimenter 
compared the actual number of teacher-initiated verbal 
interventions (either content-related questions or 
statements) for a randomly selected group of 10 stu- 
dents with the teacher's postclass ratings of these stu- 
dents regarding the amount of intervention they re- 
quired for learning to occur. The ratings were made on 
a6-point scale. Students assigned a 4 or 5 on the rating 
scale were considered to be high-need students, that is, 
those most needing teacher intervention. Students 
assigned a 0 or 1 were considered low-need students, 
that is, those needing little or no assistance for learning 


- to occur. A large discrepancy between the indicated 


student need and the number of teacher interventions 
with the student(s) was defined as a low level of control, 
and conversely, a low discrepancy between these vari- 
ables was defined as a higher level of control. Teachers 
were defined as having “low” control of their teaching 
behavior if they were observed to ask a small number 
of questions or direct few subject-matter-related 
statements to students they rated as 4s or 5s (high-need 
students) while directing a large number of questions 
or statements to students they rated as 0s or 1s (low- 
need students). Immediately following class, the 
teacher was provided feedback with respect to the ef- 
roncsa with which Competency 1 was implement- 
ed. 

When a persistent pattern of teacher attention to 
students rated as needing little teacher intervention for 
learning to occur was observed, the information was 
presented to the teacher. The teacher was asked to 
explain why so much assistance had been provided to 
low-need students and so little assistance to high-need 
students, If the teacher explained that many questions 
Were asked of the low-need students to motivate them 
or to encourage them to attend to the lesson, the teacher 
Was encouraged to try to develop alternate strategies 
to increase motivation and attention, so that more of the 
teacher's instructional time could be distributed to 
Students most in need of assistance. If the teacher 
could present no rational explanation, the experimenter 
encouraged the teacher to try to develop strategies for 
intervening more often with students requiring assis- 
lance. By encouraging the teachers to develop their 
own hypotheses associated with the control of their 
teaching behavior, the problem of multiple interpre- 
tations associated with the feedback was reduced. 

The following procedures were used in collecting data 
and in providing feedback to the teachers with respect 
lo their ability to generate and test hypotheses 8 
ted with reducing learner inattention (Competency 2). 
First, it was assumed that a persistent pattern of learner 
inattention had to be identified before any j 
feedback could be provided to the teacher. A persistent 
Pattern of learner inattention was defined as three or 
More unsuccessful attempts on the part of the teacher 
to involve the learner in the instructional process. 

hen this condition was observed, each teacher re- 
*Ponse to the unacceptable learner behavior was w 
Corded. If the teacher's responses were unvarytng. ti 
as expected that feedback to the teacher regarding ti 
ineffectiveness of the unvarying response pattern a 
"crease the probability that in future similar cases, te 

cher’s responding behavior d vary. 
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teacher's responses were observed to vary but the pat- 
tern of learner inattention remained unchanged, the 
teacher was provided a list of teacher responses and 
asked to explain why the responses varied, If no ra- 
tional explanation was given, the teacher was encour- 
aged to devote more effort to developing explanations 
that were based on a knowledge of concepts of human 
behavior. 

For example, if a teacher's first statement to an in- 
attentive student was, “Please pay attention,” and the 
second statement was, “Put the book away, now!", the 
teacher would have been reminded of the two different 
statements to the child and asked why the approach had 


ception of human behavior, If the teacher's ex; 
tion for the el ae 


productiveness of the hy 
probability 
tional 


sidering ane a 
ore pastor x in training, the teacher could use the 
experimenter as à source of knowledge or as a partici- 
pant in a concerning alternative 

that could be generated to fit the situation, ‘These 
feedback Lees were repeated throughout the 
xperimental period. 

i In an attempt to control, in part, for the — 
period. The Moore teacher 


exceeded .95. 
The measure of the effectiveness of the experimental 


theses regarding the reduction of learner 
and test bn behavior was the amount of student in- 
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attention observed in the classroom. Learner inat- 
tention was used as a primary dependent variable for 

` anumber of reasons: First, it was the desire of the ex- 
perimenters to secure a measure of student performance 
that would reflect changes in teacher behavior as a 
function of the experimental treatment. Learner in- 
attention, while differing from student performance on 
an academic task, is a measure of student performance. 
Second, learner inattention can be measured effectively 
across all subject matter and classroom situations. 
Third, learner inattention is logically related to learner 
performance on academic tasks. In addition, because 
observed reduction of an inattention problem is more 
likely to be a function of effective teaching than ob- 
served student attention, it was chosen as the dependent 
variable. Finally, learner inattention was used because 
rationally it was the most valid measure of the effec- 
tiveness of the experimental treatment in developing 
the teacher's ability to generate and test hypotheses 
associated with learner inattention. 

For purposes of evaluating the effects of training, the 
nonattending behavior of a random sample of 10 stu- 
dents was observed for each experimental and control 
teacher at the end of the experimental period. Spe- 
cifically, the nonattending behavior for the selected 
sample of 10 students was observed and tabulated at 
5-minute intervals during the class period. A tally mark 
was recorded in the appropriate column on the obser- 
vation instrument to indicate that the student was in- 
attentive, Inattentive students were those who were 
judged to be not attending to the subject or activity 
designated by the teacher. Pupils were observed se- 
quentially, utilizing the same order of observation each 
time. The interscorer reliability on the measure of in- 
attention was .92. 


Results 


An unweighted means analysis of variance 
was completed to determine if following 
training teachers would be more effective in 
bringing their own teaching behavior under 
control. For this analysis, the teachers were 
stratified on sex, students were stratified on 
need, and the teachers were compared in 
terms of the number of teacher-initiated 
interventions with students classified as 
being high or low in need of teacher inter- 
vention for learning to occur. The interac- 
tion of the need of student and experimental 
treatment was the comparison of primary 
interest in this analysis. Significant dif- 
ferences were obtained as a function of the 
interaction between need of student and 
experimental treatment, F(1, 162) = 6.81, p 
« 01. 
Results of the Newman-Keuls posttest 
analysis indicated that the experimental 
teachers gave a significantly greater amount 
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of attention to high-need (HN) students (M 
= 4.41) than they did to low-need (LN) stu- 
dents (M = 1.71), while this difference was 
not observed between the same groups for 
the control teachers (Mun = 2.70; Mix = 
2.51). 

To determine the effectiveness of pro- 
viding feedback designed to increase the 
teacher's ability to generate instructional 
hypotheses associated with reducing learner 
nonattending behavior, t tests were com- | 
pleted comparing mean percentage of 
learner inattention for male and female 
teachers in the respective experimental and 
controlgroups. Because of the possible ex- 
istence of a ceiling effect with regard to the 
dependent variable and the resulting diffi- 
culty in interpreting interactions, an analysis 
of variance was not used (Winer, 1962, p. 
257). The results of the t test analyses are 
presented in Table 1. i 

As can be observed in Table 1, a signifi- 
cant difference (p < .01) was observed for 
the respective comparisons. The mean 
percentage of inattention was lower for both 
experimental males and experimental fe- 
males than it was for the respective control 
groups. 


Experiment 2 
Method 


Subjects. Participants in Experiment 2 were H 
volunteer, nonpermanently certified teachers, with i 
in the experimental group and 10 in the contro 


Table 1 


Summary of the t Test Analysis Comparing 
Mean Percentage of Inattention in 


Experiment 1 = 


Group 
Teacher sex Experimental Control 
Male 
n 11 11 
M 5.157 26.225 
c 5.001 11.457 
t 5.588* 
Female T 
n 8 
M 5.616 16.140 
c 4.183 5.999 
t 3.679* 
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Teachers in the experimental and control groups were 
matched on teaching level, sex, and years of experi- 
ence. 

Procedure. In Experiment 2, experimental teachers 
received feedback associated with their ability to gen- 
erate and test instructional hypotheses regarding the 
organization of instructional sequences appropriate for 
the needs of individual learners (Competency 3). Ex- 
perimental teachers were involved in a 3-hour seminar 
each week of the 6-week experimental period. The 
purposes of the seminars were (a) to assist teachers in 
acquiring the knowledge necessary for utilizing the in- 
class feedback and (b) to permit teachers to demon- 
strate the acquired conceptualizations in an applied 
Situation by using simulated teaching experiences. 
Teachers received feedback regarding their success in 
applying the acquired conceptualizations in their own 
classrooms during two 1-day sessions each week of the 
6-week period. 

The following procedures were used in collecting data 
and providing feedback to teachers associated with 
Competency 3. First, it was assumed that a learner 
problem and the teacher's reaction to the problem had 

to be observed before meaningful feedback could be 
provided. In this case, a learner problem was defined 
as being the learner's inability to answer correctly a 
leacher-initiated subject-matter-related question. 
Specifically, student responses to teacher-initiated 
questions were observed. When a student failed to 
answer a question correctly, a tally mark was made on 
- the observation instrument and the teacher's reaction 
tothe incorrect student response(s) was recorded. 
The following procedures were used in providing 
feedback to the teacher following the classroom obser- 
Vation. If it were observed that the teacher's questions 
9r statements to the student having difficulty were 
Wnvarying, that is, the teacher simply repeated the 
Question or asked a question of the same level of diffi- 
culty, the teacher was given feedback designed to in- 
Crease the probability that variation in teacher behavior 
Would be observed under similar situations in the future. 
the teacher’s responses were observed to vary, but the 
er was still unable to correctly answer the question, 
‘the teacher was asked to explain during the feedback 
Session why the responses had been varied. If no ra- 
onal explanations were given, the teacher was en- 
*ouraged to devote more effort to developing instruc- 
lal sequences appropriate for individual learners 
based on his or her knowledge of learning theory asso- 
fated with instructional sequencing. 
Or example, a teacher's original question may have 
Tequired a student to apply an acceptable definition in 
ifying groups of words as positive or negative In- 
tances of the concept sentence, and the student an- 
SWered incorrectly. ‘The teacher then asked the stu- 
Vent, “What is a sentence?" followed by, “Give me a 
Sentence of your own.” The teacher was provided with 
alist of the questions asked the student and asked why 
48086 questions were raised. If the teacher answered, 
ose were the only two questions that I could think 
fat the time,” the conclusion would have been that a 
Sacher did vary the instructional responses but MEE 
An acceptable explanation for the actions. The f ed 
Jack from the experimenter would have been désigne 
“assist the teacher in determining what prerequisite 
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skills a student had to possess in order to answer the 
initial (higher-order) question and what questions 
(lower-order) could have been asked to determine if the 
student possessed the prerequisite skills, 

If the teacher's explanation for the instructional be- 
havior were judged to be based on a knowledge of ap- 
propriate instructional sequencing, it was assumed that. 
questions associated with the ineffectiveness of the 
explanation would cause the teacher to seek additional 
knowledge as a basis for generating further explana- 
tions. 

As in Experiment 1, teachers could use the experi- 
menter as a source of knowledge or as a discussion 
participant where the purpose was the generation of 
alternate hypotheses or explanations. 

To control for the effects of type of feedback, the 
control group participated in Experiment 1 during the 
same experimental period, thus receiving feedback 
appropriate for the objectives of Experiment 1 but not. 
appropriate for the acquisition of the competency as- 
sociated with Experiment 2. 

Data collection. For purposes of evaluating the ef- 
fects of the experimental treatment, a random sample 
of 10 students in each experimental and control teach- 
er's class was selected for observation. During the 
45-minute class observations conducted at the end of 
the experimental period, responses of the selected stu- 
dents to teacher questions as well as the teacher reac- 
tions to student responses were recorded. As in the 
previous experiment, teachers were unaware of which 
students constituted the sample until the observation 
had been completed. Observations were made of 
changes in both the teacher and learner behavior, but. 
the ultimate measure of the effectiveness of the ex- 
perimental treatment was change in student behavior. 
Tn this case, the measure was the ability of students, 
ultimately, to answer questions that they had initially 
answered incorrectly. This measure was used for the 
following reasons: (a) Since it was a measure of student 
performance, it was thought to be a more valid measure 
of the effects of the experimental treatment than direct 
observations of changes in teacher behavior; (b) it was 
a measure of student performance directly related to the 
academic task; (c) it was a measure that could be used 
as a basis for experimental comparisons across " 
subject matter and classrooms; and (d) it could be rel ; 
ably observed. Interscorer reliability on the instrumen! 
designed by the principal investigator was 92, 

One measure of change in teacher behavior Mie 
teacher response to students who incorrectly answe 
questions. Specifically, teachers, when attempting to 
evaluate a student's mastery of instructional objectives, 
were observed to determine if they continued to s 
with the student following an incorrect answer. When 
teachers were observed to generate and test hypotheses 
related to the reorganization of the instructional se- 
quence to meet the needs of a particular student, the 
actual teacher questions were recorded and the appro- 
priateness of the questions was judged by the experi- 
menters according to a predetermined set of criteria. 
The criteria employed included (a) Were the questions 
(or transactions) lower-order questions? (b) Were the 
lower-order questions ofa type that would produce re- 
sponses that could be explained in terms of a hierar- 


chical learning model? 
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These evaluation procedures, while appearing to be 
somewhat similar to “probing” procedures discussed 
by Rosenshine and Furst (1971, p. 53), differed in that 
the questioning transaction used in the current study 
was not for the purpose of encouraging the student to 
interpret, elaborate, or generalize. Rather, it was as- 
sumed that the desired response could not be made by 
the learner unless the instructional sequence was re- 
structured. Thus, if the teacher asked the learner 
lower-order questions, the learner’s response or lack of 
response would provide information that could be used 
as a basis for restructuring the instructional sequence 
to make it more appropriate for the student. 


Results 


Several unweighted means analyses of 
variance, with teachers stratified on sex in 
each analysis, were completed to determine 
if teachers, given appropriate feedback, 
would be more effective in generating and 
testing hypotheses regarding student 
learning problems associated with instruc- 
tional sequencing. One of these analyses 
was completed to compare changes in stu- 
dent behavior as a function of the experi- 
mental treatment involving their teachers. 
Other analyses compared changes in teacher 
behavior. 

One measure of change in teacher behav- 
ior used to compare experimental and con- 
trol teachers was differences in teacher re- 
sponses to students’ incorrect answers. 
Following an incorrect response, the ques- 
tion considered was, Did teachers move to 
another student in their questioning or did 
they attempt to diagnose the problem by 
asking the student additional questions? 
An analysis of variance was used to deter- 
mine whether significant differences existed 
between the treatment groups in relation to 
the number of times teachers attempted to 
employ diagnostic procedures. 

There were no significant differences in 
the number of times teachers engaged in 
hypothesis generation and testing proce- 
dures as a function of sex of teacher, F(1, 17) 
= .62, or in the interaction of the treatment 
and the sex of the teacher, F(1, 17) = .13. 

A significant difference between experi- 
mental and control group teachers as a 
function of the treatment was observed, F(1, 
17) = 8.58, p < .01. The mean number of 
procedures initiated by experimental 
teachers (M — 2.91) was greater than the 
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mean number initiated by control group. 
teachers (M = 1.10). [ 

Data were also compared to determine the 
probable appropriateness of the hypothesis 
generation and testing procedures employed. 
For this analysis, only procedures judged to 
be appropriate were included. Appropriate 
procedures, as explained previously, were 
those in which the teacher's behavior was 
observed to vary in a rational way, that is, to | 
proceed from the original question to ques- | 
tions associated with prerequisite concepts 
in the learning hierarchy in an effort to de- | 
termine what factors accounted for a stu- 
dent's inability to answer the original ques- | 
tion. It was possible to compare data for 
only 6 of the 10 teachers in the control group, | 
since only 6 teachers employed sequences 
judged to be rationally defensible. Data 
were available for all 11 experimental 
teachers. - 

There were no significant differences M 
number of rationally defensible sequences 
initiated as a function of sex of teacher, F(1, 
13) = .53, or in the interaction of the treat- 
ment and the sex of the teacher, F(1, 13) = 
noe 

Significant differences were observed 
between experimental and control teachers 
as a function of the treatment, F(1, 13) = 
3.70, p <.10. In this case, the experimental 
teachers initiated more rationally defensible 
sequences (M = 2.91) than control teachers 
(M = 1.67). : 

To determine the effects of the expert 
mental treatment on learner performance 
the experimental and control teachers were 
compared in terms of the number of times 
students correctly answered previously 
missed questions. Again, there were n? 
significant differences as a function of the 
sex of the teacher, F(1, 17) = .07, or the m- 
teraction of treatment and sex of the teacher 
F(1, 17) = .04. 

Significant differences were observed 
between experimental and control teachers 
as a function of treatment, F(1, 17) = 9.26, 
D X 0l. Students of experimental teachers 
correctly answered previously missed ques- 
tions more often (M = 2.27) than did stu- 
dents of control teachers (M = .60). 

Finally, an analysis was completed to a5- 
certain whether any significant difference 
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existed between experimental and control 
group teachers with respect to the number 
of teacher-initiated questions that were 
initially answered incorrectly by the stu- 
dents. The assumption underlying this 
analysis was that if there were a significant 
difference between the groups in terms of the 
number of questions initially answered in- 
correctly by students, the results of other 
measures of the effectiveness of the experi- 
ment would be questionable. 

In this analysis, experimental and control 
group teachers were compared in terms of 
the number of teacher-initiated questions to 
which students initially responded incor- 
rectly. 

_The results indicate that there were no 
significant differences in the number of 
questions answered incorrectly by students 
asa function of treatment, F(1, 17) = .09, as 
a function of sex of teacher, F(1, 17) = .66, or 
as a function of the interaction of treatment 
and sex of teacher, F(1, 17) = .66. 


General Discussion 


_The data from Experiments 1 and 2 pro- 
vide support for the hypothesis that if 
teachers are given feedback associated with 
(a) the control of their own teaching behavior 
and (b) the acquisition of competencies re- 
lated to a problem-solving approach to in- 
struction, predicted changes will occur in 
both their behavior and ultimately in the 
behavior of their students. 

. The fact that in Experiment 1 the exper- 
imental group's performance, both in terms 
9f controlling their own behavior and in 
terms of their ability to modify learner at- 
tending behavior, was greater than the con- 
trol group tended to be consistent with the 
findings of Moore et al. (Note 1), thus in- 
creasing both the confidence in, and the 
&neralizability of, these findings. Further, 
the fact that male teachers’ effectiveness, aS 
Measured by a reduction in the amount 0 

earner inattention, was greater for experi- 

Mental teachers than for control teachers 
Supports the hypothesis that the failure to 
obtain differences for the comparable com- 
Parisons in the Moore et al. (Note 1) study 
an a function of the small size of the sam- 
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Not only were the differences between the 
experimental and control groups statistically 
significant, but they were of practical sig- 
nificance. These practical differences can 
be observed, first, by the fact that the level 
of inattention noted for classrooms of ex- 
perimental teachers was approximately 5%, 
while the level of inattention in the class- 
rooms of control teachers was approximately 
23%. Second, experimental (e) teachers 
interacted with high-need students more 
than 11) times more often than control (c) 
teachers (M, = 4.41 vs. Me = 2.70). 

The fact that teachers who received 
feedback associated with their ability to or- 
ganize instructional sequences appropriate 
for individual learners were more effective 
in designing appropriate instructional se- 
quences than were teachers who received 
feedback that was inappropriate for attain- 
ing the competency associated with in- 
structional sequencing provides support for 
the importance of giving feedback to teach- 
ers with respect to Competency 3. Specifi- 
cally, it was observed in Experiment 2 that 
experimental teachers engaged in hypothesis 
generation and testing procedures more than 
twice as often (M 7 2.91) as control teachers 
(M = 1.10) and nearly twice as many exper- 
imental teachers (n 7 11) initiated rationally 
defensible sequences than did control 
teachers (n = 6). Further, the number of 
rationally defensible sequences also ap- 
proached twice as many for experimental 
teachers (M, = 2.91 vs. Mc = 1,67). Finally 
and most importantly, the number of items 
students of experimental teachers answered 
correctly after first answering them incor- 
rectly was almost four times as great as the 
control group (Me = 2.27 vs. M, = .60). 

The fact that control teachers were re- 
ceiving feedback associated with Com- 
petencies 1 and 2 at the same time the ex- 
perimental group was receiving feedback 
with respect to Competency 3 suggests that 
the experimental results were not simply a 
function of the Hawthorne effect. 

Summary. The results of this study (a) 
extend the generalizability of the findings 
both in terms of population and competen- 
cies and (b) indicate that feedback associ- 
ated with the acquisition of competencies 
related to a problem-solving approach to 
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instruction does increase the probability of 
bringing about desired changes in both 
teacher and learner behavior. 
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Tree Searching and Student Problem Solving 


Donald L. Alderman 


University of Connecticut 


A process description of problem solving consistent with student performance 
has eluded psychologists in our attempts to anticipate problem difficulty and 
prescribe appropriate instruction. Yet, a goal tree—like a learning hierar- 
chy—imposes structure on a task and suggests the manner in which students 
might approach problem solving. The present study applies tree searching as 
a computer model of simple addition sentences with the general form m +n = 
p. It was found that the number of problem reductions performed in tree 
searching accounted for most of the variance across problems in student error 
rate as well as in time taken for solution. In effect, this technique constitutes 
a computer test for the adequacy of a prescription of how to solve problems. 


When students solve a problem, they 
usually go through a number of steps before 
reaching the final solution. Sometimes 
these intermediate steps appear as part of a 
written solution. In taking a classroom 
math test, for example, a student might be 
directed to show all work in order to receive 
full credit. But more often, as in multiple- 
choice tests, a student leaves no evidence to 
suggest the paths to a solution. This reflects 
our usual emphasis in education on the final 
solution or product from problem solving. 

If we want to know how a student goes 
about finding a solution to a problem, then 
our interest is in the process of problem 
Solving. We would expect such an emphasis 
in cognitive psychology, where concern for 
thought and higher-order learning leads us 
to emphasize covert behavior. Process 1s 
highly relevant to educational psychology, 
Since teaching problem solving might ra- 
tionally parallel a student’s own process. 
Indeed, Bruner (1968) argues that structure 
and sequence constitute two essential fea- 
tures of instruction. These same charac- 
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teristics determine to a large extent a prob- 
lem’s difficulty. Perhaps Gagné’s work 
(1963, 1968, 1970) best illustrates this point: 
His learning hierarchies suggest a task’s 
structure in terms of its component parts 
and portray possible sequences for the 
completion of these parts to arrive at a final 
solution. Thus, astudent might change 2 + 
5 =_ to the form 5 + 2=— asa prerequisite 
step in the addition of integers, but to do so, 
he or she needs awareness of the commuta- 
tive law. Note that the solution itself is not 
even involved in this preliminary stage of 
problem solving. Imagine the complexity, 
for a learner unfamiliar with the required 
concepts, in solving a set of simultaneous 


linear equations. 


Goal Trees as Problem Reductions 


A learning hierarchy is one method for 
translating the covert behavior of learning 
into explicit aids for instruction. And it 
bears a close resemblance to current popular 
methods in the study of cognition, especially 
those of Newell and Simon ( 1972) called goal 
trees, A tree allows us to picture the path to 
a complex goal as a succession of simpler 
subgoals (just as application of the commu- 
tative law was a subgoal for the addition of 
integers). This reduction to subgoals is re- 
cursive: It can continue until all parts of a 
problem are laid out, and the path to a so- 
lution is clear. One common example of 
such reduction occurs when we parse sen- 


chological Association, Inc. 0022-0663/78/7002-0209$00.75 


209 


210 


tences according to grammatical rules. A 
simple sentence has a subject phrase and 
predicate phrase; the predicate in turn may 
consist of a verb and its object. The verb is 
“final” in that it corresponds to a part of 
speech, but the object may be broken further 
into an adjective and noun. Thus, such a 
goal tree adequately represents a transfor- 
mational grammar. 

As Simon (1969) wrote, there is ample 
evidence to suggest that problem solving is 
really a form of successive problem reduc- 
tions. Nowhere is the evidence stronger 
than in the research area called artifical in- 
telligence. This field began as an effort to 
demonstrate with a computer such behaviors 
as playing checkers or proving logical theo- 
rems. It has now emerged with a discipline 
of its own that illustrates information pro- 
cessing as a way of understanding cognition 
(Hunt, 1968). 

Education has been slow to tap the rich- 
ness of computer science for the under- 
standing of instruction. One educational 
use of a tree structure has been to clarify 
different values in education. Page’s (1974) 
value tree branched from the top value 
(*bentee") to major traits like verbal ability 
and quantitative skills, then branched to 
further levels with subdivisions of these 
traits (for verbal, into grammar and litera- 
ture and others) and proceeded in this 
manner until a value became so specific as to 
suggest the content of a particular test item. 
In another paper, Goldin and Luger (Note 1) 
used game descriptions from artificial in- 
telligence to illustrate Piagetian structural 
principles. They show, for example, a fun- 
damental correspondence between the op- 
erations required to recognize equivalent 
states in tic-tac-toe and those needed to 
conserve number regardless of objects’ ar- 
rangement. Such correspondence, however, 
arose from descriptive parallels rather than 
from quantitative tests. The present study 
took tree searching as a model of process in 
simple addition and tested the extent to 
which measures of the computer process 
conformed with actual student perfor- 
mance. 


Models of the Addition Process 


Arithmetic exercises furnish familiar and 
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clear cases for exploring the process of 
problem solving (as demonstrated by Gagné, 
Mayor, Garstens, & Paradise, 1962). Cer- 
tainly, elementary arithmetic qualifies as a 
well-defined problem domain, since it has 
strict rules to govern operations and to judge 
solutions. Thus, we chose addition sen- 
tences (i.e., 29 + 3 = _) of the sort encoun- 
tered in elementary schools as test cases for 
tree searching. 

For single-digit integers, Groen and 
Parkman (1972) examined five models of the 
addition process. Each model assumed that 
addition is a reconstructive process that 
generates facts on the basis of stored rules. 
(By contrast, a reproductive process simply 
retrieves stored facts, as students might if 
they committed the addition table to mem- 
ory.) Their models explain the solution of 
a problem of the form m + n as a counting 
procedure. Since both addends were posi- 
tive whole numbers with a sum less than 10, 
the range of model relevance is obviously 
restricted. Only two operations take place: 
setting the counter to a specified value and 
incrementing it by one. "Their five models 
follow from the setting of the counter, which 
may be (a) zero; (b) the leftmost addend m 
(c) the rightmost addend n; (d) the mint 
mum of m, n; or (e) the maximum of m, ?- 
The counter is then incremented x times t? 
arrive at the sum m +n. The “best” modd 
was that which would most closely match the 
time children needed to respond. It prove 
to be the last model—that with the counter 
set to the maximum of m, n and then incre 
mented by the minimum of m, n. | 
minimum of m, n, then, together with à 
constant term standing for the setting 0P 
eration, accounted for 80% of the variance 
the response time taken by children. i 
other words, if a child is to solve a proble i 
such as 2 + 5, the amount of time he RE E 
takes to reach the sum corresponds to the 
far forward he or she must count from 
larger addend. m 

Suppes (1967) described process mo E 
for addition problems of the general for E. 
+ n = p + q with one unknown and t deh 
addends given. Ina linear regression e fot 
he obtained multiple correlations of 8 nil 
both children's solution time and prone e 
ities of correct responses across proble 
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The exercises themselves came from a 
fourth-grade addition program and involved 
sums less than 40. The three variables em- 
ployed in the model as predictors of student 
performance were the magnitude of the sum, 
the magnitude of the smallest addend, and 
the number of steps in a solution algorithm. 
The solution algorithm reflected the number 
of formal steps necessary to transform a 
problem into canonical form (where only the 
unknown would be to the right of the equal 
sign), the number of single-digit addition 
operations, and the number of digits that 
must be held in memory or “carried.” These 
structural variables were meant to represent 
those problem properties that affect diffi- 
culty; they obviously succeeded, as judged by 
their strong correlation with student per- 
formance. 

The addition process has escaped atten- 
tion within artificial intelligence: It is too 
simple to qualify readily as “intelligent” 
behavior. Newell and Simon (1972) did 
devote a section of their book to the anlaysis 
of cryptarithmetic problems like CROSS + 
ROADS = DANGER (in which each letter 
represents a digit). Their analyses, like 
Others in artificial intelligence, relied es- 
Sentially on the comparison of a student’s 
Solution protocol and that of a computer. 
But for educational understanding, we need 
not only models but also measures of those 
Models. Nilsson (1971) suggested the type 
of measures appropriate as an index of effi- 
ciency for computer searches: straightfor- 
Ward counts of problem reductions. Since 
these measures also seem to provide vari- 
ables that indicate the complexity of a pro- 
cess, there is available a strong alternative to 
Protocol comparisons in the analysis of 
Computer process and student product. 


Tree Searching as a Procedure 
for Addition 


Minsky (1970) made the dramatic claim 
that “the computer scientist . . . is the pro- 
Prietor of the concept of procedure, the se- 
tet educators have been so long seeking" (P- 

14). He has also suggested that descrip- 
lions of thought processes restated become 
Prescriptions for the design of computer 
Programs (Minsky, 1966). We take a con- 
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verse position: that tree searching can be- 
come a prescription for the design of in- 
struction. If a computer procedure gives an 
accurate description of the addition process, 
it should be possible to portray students' 
covert behavior as a series of problem re- 
ductions in a goal tree. Then, our teaching 
can follow the same process description. 
This position depends on the fit between 
measures of the computer search and actual 
student performance. It also depends on 
empirical evidence demonstrating that stu- 
dents actually use successive goal reductions 
in problem solving (e.g., Newell & Simon, 
1972; Reed & Abramson, 1976). By con- 
structing a computer model of the addition 
process, we address only the former condi- 
tion. 

For the purpose of this study, it was im- 
portant to choose a set of addition tasks for 
which data on student performance were 
available as product measures. Suppes, 
Jerman, and Brian (1968) reported propor- 
tion of errors as well as latency for student 
responses to problems in an elementary 
school arithmetic program. A sample of 38 
addition problems was therefore selected 
from their work (Suppes et al., 1968, pp. 
907-208). These tasks had the uniform 
format m + n = p that called for the addition 
of whole numbers, and the sums fell between 
20 and 50. The position of the unknown 
quantity, the place holder, varied across 
problems. Thus, there were three variations 
of the general form: m +n 2 —, m +t— = p, 
and__+n=p. Although selecting these 
sample tasks preceded constructing our 
model, the process description posed below 
draws on findings unrelated to the source for 
data on student performance. 


Process Structure 


ree of the process for these problems 

M in as Figure 1. The top node of the 
tree indicates the general form m + n = p 
and is the starting point for the computer 
search. Branches below that node lead to 
three sets of subgoals, according to the po- 
sition of the place holder: m +n =—,m + 
=p,and—+n=p. At this second level, 
each term in the problem statement splits 
into a separate node. Arcs join the three 
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Figurel. Goal tree for simple addition. (m is the leftmost addend, n is the rightmost addend, and p 


isthe sum. Circled numbers represent nodes.) 


terms of a statement as conjunctive (AND) 
subgoals: All three subgoals must be at- 
tained in order to reach a solution. The 
absence of an arc between branches implies 
disjunctive (OR) subgoals, or alternative 
paths to solution. Thus, the sets of subgoals 
form+n=_,m+__=p,and_+n=p 
represent three distinct paths to solution. 
Which path should be taken obviously de- 
pends on the position of the place holder. In 
effect, that position forces a different ap- 
proach to solution. 

These different approaches become evi- 
dent at the third level of the tree (see Figure 
1). Branches below each task format specify 
what operations to follow, and the nodes 
define the range of numerical values. If the 
form is m + n =__, it simply requires the 
recall of an addition fact. This reproductive 
process takes advantage of the small size of 
the second addend, 0 < n < 9. It assumes 
that such recall from memory depends only 


on the units’ digit of the second addend.. 


The four nodes below n, then, follow the 
lifficulty index for single digits used by 


Wexler (1970). These express the relative 
difficulty a child encounters with different 
addition facts. f 
If the form is m +— = p, there is? 
-counting procedure with two operations. 
counter is at first set to m and then incre 
mented by one until it reaches P., 
number of times the counter must be incre 
mented is the value assigned to the plac? 
holder. This process is similar to that x 
Groen and Parkman (1972), except that 
setting the counter depends on the magn 
tude of its initial value. "The nodes below a 
when the task format is m +— = p, allow al 
that setting operation. Here, the inia 
value is given as a multiple of 10, and 80 9 
operation for counter setting depends on 
tens' digit of the addend. However 
initial value actually assigned to the b. d. 
(m) must be the full value of the adden? 
The count forward from m has been simP 
fied to just three nodes below the H p 
holder (Nodes 16, 17, and 18). It Me 
flects incrementing as an operation bc 
manner consistent with that for sett! 
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counter. Had there been unique nodes for 
all initial values of the counter, a separate 
node for each possible value of the place 
holder might be appropriate. But such 
distinctions soon make the addition process 
m whether in a goal tree or for a 
child. 
With the unknown in the position m, as in 
—+n = p, the model uses a process similar 
to counting forward. It, too, consists of two 
operations. The first is still setting the 
counter to an initial value, but the second is 
decrementing the counter rather than in- 
crementing it. The initial counter value is 
set to p (Nodes 25, 26, 27, and 28) and then 
decremented n times (Nodes 22, 23, and 24). 
Thus, this process is counting backward. 
En the sum to determine the missing ad- 
end. 


Search Sequence 


The arrangement of nodes in the tree 
guides the computer solution process. At 
first, the search moves from the top down in 
order to reduce the problem to simpler 
Subgoals. Where arcs link branches, the 
top-down strategy generates a set of con- 
Junctive subgoals as in m + n = (Nodes 2, 
3, and 4 in Figure 1). Then, the computer 
tries to locate each subgoal in the problem or 
leduce it still further. This search for 

- Sübgoals moves left to right: First it checks 
for the term m (Node 2). Given a statement 
ofthe form ~ + n = p, obviously m does not 
occur; the search moves to the next set of 
Subgoals. Here, the next move is to generate 
™M+__= p (Nodes 9, 10, and 11). Now itis 

- Possible to reduce m in the goal tree to sim- 

- ler subgoals (Nodes 12, 13, 14, and 15), but 

| Still there is no match because m has no 
Value in the problem statement. The search 
Must turn to the rightmost set of subgoals in 

he second level of the tree (Nodes 19, 20, 
and 21), Since the statement does indeed 
Conform to this format, a solution is found. 

. general, the search in this addition 

| is depth first (see Nilsson, 1971). It 
®Xpands a problem to subgoals from top- 

i down and exhausts these reductions before 

Attempting to move to the right to find an 

Alternative path. But these alternatives 

become evident as the search shifts from left 
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to right in the goal tree along disjunctive 
branches, Thus, the arrangement of nodes 
m our tree for the addition process is criti- 
cal. 

The left-to-right order of m+n = _,m+ 
— 7 p, and— +n = p is from Weaver (1971). 
When the position of the unknown quantity 
changes across p, n, and m, the addition 
sentence increases in difficulty. In setting 
a counter, the order of different initial values 
(Nodes 12, 13, 14, and 15 as well as Nodes 25, 
26, 27, and 28) is a direct result of their 
magnitude. The actual count, forward or 
backward, follows from the magnitude of the 
increment (Nodes 16, 17, and 18) or decre- 
ment (Nodes 22, 23, and 24) required in 
order to arrive at a solution. The left-to- 
right arrangement of nodes, therefore, re- 
flects either relative difficulty or relative 
size. 

The search tries reductions before it at- 
tempts to identify alternative paths. Such 
a priority seems consistent with our own. 
We too tend to pursue relational connections 
(like further problem reductions) before we 
seek disjunctive associations (see Hunt & 
Hovland, 1960). How common x is that we 

lution to a problem only to repeat 
Rees as "A The search behaves 
much as we do: It persists in following re- 
lational connections along one path before 
forsaking that path and trying another. 


Measures of Process and Product 


If we count the problem reductions in a 
tree search, we can consider it a measure of 
the computer's addition process. We would 
always expect the search procedure to find 
a correct solution; an error proves the model 
an incomplete working description of the 
addition process. Therefore, we seek not a 
simple right-wrong measure, but the num- 
ber of subgoals used in the process. For 
example, the total number of subgoals gen- 
erated in solving 26 + 0 =— was 5 (Nodes 1, 
2, 3, 4, and 5 in Figure 1); while for... +1= 
37, it was 15 (Nodes 1, 2, 3, 4, 9, 10, 11, 12, 13, 
19, 20, 21, 22, 25, and 26). Hereafter, we 
refer to the total number of nodes in a tree 
search as nı In this count, we actually in- 
cluded every problem reduction attempted 
and not just those that led to subgoals (e.g., 
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after a terminal node like Node 2). There 
was also a more simple process measure that 
yielded greater consistency across problems: 
the number of subgoals required to attain a 
solution (n,). This variable had a value of 
5 for the problem 26 + 0 = and 6 (Nodes 
1, 19, 20, 21, 22, and 26) for — + 1 = 37. If 
the goal tree is an adequate description of 
the process, then n, must be less than or 
equal to nt, and the nodes counted for ng 
must be a subset of those for ny. In other 
words, a search procedure has to use the re- 
quired subgoals in order to solve the prob- 
lem. 

As indicated earlier, some useful data on 

student problem solving were already 
available in the literature. We chose to use 
data from a report on the Stanford Arith- 
metic Program (Suppes et al., 1968) because 
it was one of the few sources that listed both 
percentage of students in error and the time 
taken to reach a correct response. The 38 
addition sentences (like m + n = p) were 
from exercises appropriate for a fourth-grade 
arithmetic program. The student group 
consisted of about half boys and half girls 
from a middle-class, suburban elementary 
school, and most were above the national 
mean in intelligence. Their performance 
across the 38 addition sentences furnished 
the product measures for our model of stu- 
dent problem solving. 

With measures for both computer tree 

searching and student problem solving, it 
was possible to test the prescription for the 
addition process against actual problem 
difficulty. "This was done through multiple 
linear regressions, where both n, and n, were 
independent variables and either error rate 
or response latency was the dependent 
variable. The regressions employed process 
variables from tree searching to account for 
variations in student solutions across addi- 
tion tasks. 

It is important to note that the unit of 
observation was the problem itself not the 
individual student. This reflects interest in 
describing a structure and sequence for in- 
struction consistent with the way students 
actually approach addition sentences like m 
+n=p. The value of such a description 
depends on explaining the problem difficulty 
and therefore on regressions across problems 
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rather than across students. 


Results and Discussion 
Problem Difficulty 


With the proportion of students commit- 
ting an error on a problem (r;) as the de- 
pendent variable and the number of problem 
reductions (n, and n,) as independent vari- 
ables, a linear regression yielded a high 
multiple correlation (R = .90) and a highly - 
significant test for the associated analysis of 
variance, F(2, 35) = 72.40, p < .001. Thus, 
the measures derived from tree searching 
accounted for a high proportion of the vari- 
ation in student error rate across problems. 
Indeed, the single coefficients associated 
with the numbers of problem reductions 
proved to be significant: for n, (35) = 
11.40, p <.01, and for ng, t (35) = —4.26, p < 
01. That is, both furnished useful infor- 
mation for estimating problem difficulty. 

The coefficients (b;) established in the 
regression for error rate, together with the 
process measures for each problem, deter- 
mined the prediction equation 


fi = b, ny + bng Ng + a, 


where f; was the predicted probability of an 
incorrect response for the ith problem an 
a the constant term. Figure 2 depicts the 
observed and predicted values for error rate 
across addition sentences. From inspection. 
it appears that the predictions for error rate 
fit the actual data fairly well. 

The chi-square statistic provided a test for 


the goodness of fit between observed and 


predicted error rate. While this statisti 
normally applies to discrete events or samplè 
and population parameters, it also provides 
a test for accuracy across the nominal cate 
gories of addition tasks. Calculations ha 
to approximate the frequency of incorre“ 
responses based on the total number of E 
dents (21) and the probability of error on 
given problem. The distribution of predic 
tions from tree searching did not represen 
a significant departure from the distributio? 
of observed problem difficulties, x37) 
48.74, p > .05. Yet, the level at whic i 
reported chi-square statistic is significan 
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rank order of observed error rate.) 

4 
-10) further suggests a reasonable proba- 
ility that predicted error rates failed to 
latch observed problem difficulties. This 
ult still compares favorably with the 
ness of fit for other models of the addi- 
process (e.g., Suppes et al., 1968) and 
tems to indicate a relatively close fit of the 
Predictions from tree searching to the data 
m student problem solving. 


olution Time 


"The analysis of variance for the second 

Tégression that attempted to apes differ- 

| fnces in response latency across p! 

Was also highly significant, F(2, 35) = 42.52, 
[ Process measures again a 

tong relationship to actual student data as 

ed in the multiple correlation (R = 


rob fom reductions as a predictor of 
lime in student problem solving. It 
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PROBLEM NUMBER 
P Figure 2. Observed and predicted error rate across problems, (Problem numbers are arranged in 


Consistency of Results 


tent to which tree searching pro- 
ita explanation of student problem 
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MEAN LATENCY OF CORRECT RESPONSE 
(IN SECONDS) 
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Figure 3. Observed and predicted success latency across problems. (Problem numbers are arranged 


in rank order of observed error rate.) 
j 


solving was consistent with findings for other 
models of the addition process. Groen and 
Parkman (1972) accounted for 80% of the 
variance in solution time for single-digit 
addition. Our measures of problem reduc- 
tions gave a process description that covered 
about 70% of the variance in solution time 
and 80% of the variance in problem diffi- 
culty. In effect, the goal tree incorporated 
the counting model for single-digit addition 
as part of a broader process description. It 
also introduced an operation for setting a 
counter’s initial value depending on the 
magnitude of the term. Despite such ex- 
pansions, the results are comparable for the 
two studies. 

Apparently, our process measures are 
more strongly related to student perfor- 
mance than are formal properties of the task. 
Suppes and his colleagues (1968) had based 
their model of the addition process on the 
magnitude of the sum, the magnitude of the 
smallest addend, and a count of the formal 
operations involved in problem solving. 
Their own multiple correlations for error 
rate and latency were .69 and .56, respec- 
tively, while our process variables taken from 
tree searching yielded correlations of .90 and 


.84, using identical student data. However, 
our AND/OR goal tree for the addition pro- 
cess is applicable only to the problem format 
m+n=p. The model derived from form 
task properties has explanatory merit for 
problem formats like m + n =p t q. Fo 
such addition tasks, the strength of rela- 
tionship with student performance (R = 6 
Suppes, 1967) was consistent with our col 
relations. à 

While the results obtained with tree 
searching were consistent with previous WOT' 
on the addition process, the method dn 
parted from other models. There is prec 
dent for the use of goal trees as represen i 
tions of problem solving. Certainly, the tr* 
structure resembles a learning hierarchy, an 4 
such prerequisite, conjunctive, M 
disjunctive descriptions appear elsewher 
than in AND/OR goal trees. Neverthele 
tree searching is testable and offers a sm 
yet specific, parallel to student probe f 
solving. Problem reductions give u$ oe 
plicit measure of process to complemen 
usual evidence of student products. 

Further work needs to be done to ex! 
procedures like tree searching aS P inf 
descriptions of student problem $° 


plot? 
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“especially in education. A goal tree, like a 
learning hierarchy, encourages us to specify 
the structure of a task in a manner that 
might facilitate student learning. Results 
from this study suggest a strong parallel 
between problem reductions and problem 
difficulty. If students do solve problems as 
a series of subgoals, a goal tree would be a 
reasonable structure for instruction on a 
specific task. Furthermore, tree searching 

‘detects and takes advantage of relations 

3 among subgoals to simplify the sequence of 
problem solving. This implies an order for 
presenting a problem’s steps consistent with 
the way students approach a solution. Here, 
then, is a computer model for the kind of 

assumptions we must make in teaching. It 
Temains to be demonstrated, by taking 
Measures of both process and product from 
student problem solving, that students do 

approach problems in a manner consistent 

With tree searching and can take advantage 

ofsuch a procedure in learning mathematics. 

Whether similar models can be applied in 

Subjects other than mathematics is also left 

tobe shown. The potential for application 

Seems evident from the diverse uses of 

arning hierarchies within education and 

Boal trees within cognitive psychology. 


Reference Note 


L Goldin, G. A., & Luger, G. F. Artificial intelligence 
models for human problem solving. Paper pre- 
Sented at the meeting of the American Educational 
5 in Association, New Orleans, February 
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A Search for Social Intelligence 


Daniel P. Keating 
Institute of Child Development, University of Minnesota 


Efforts to highlight educationally neglected areas of abilities have often fo- 
cused on the putative domain of “social intelligence.” The empirical coheren- 
cy of this domain was investigated in a group of college students (N = 117), 
since Dye and Very’s and others’ findings would predict maximum differentia- 
tion of abilities in this age group. Three measures of “academic” intelligence 
and three measures of “social” intelligence were used. Major findings on the 
social domain were (a) intradomain correlations were no higher than interdo- 
main rs, (b) factor analyses produced no identifiable social factor, and (c) “ac- 
ademic" measures were better at predicting a social competence criterion than 
"social" measures. Implications for research and practice are discussed. 


In recent years, one of the most persis- 

tent recommendations regarding research in 
human abilities has been to diversify and 
broaden the concept of ability to include 
areas other than those school-defined abili- 
ties traditionally assessed. It has been ar- 
gued that the domains of academic aptitude 
and achievement, or even of intelligence in 
the IQ sense, are too restrictive to account for 
all of the educationally relevant individual 
differences in abilities (e.g., Neisser, 1976). 
An area that has been singled out as both 
important and neglected is that of social 
intelligence (Flapan, 1968; Flavell et al., 
1968; Selman, 1976). As is the case with 
numerous global psychological constructs, 
it is relatively difficult to pinpoint what 
abilities or skills do or do not fall within the 
domain, or even whether an empirically co- 
herent domain exists. If some empirical and 
conceptual coherency for the domain can be 
demonstrated, the subsequent question of 
its relationship to other domains then 
arises. 

This article reports a preliminary inves- 
tigation of this domain of social intelligence. 
The approach was to use variables that had 
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some empirically demonstrated relation: 
to the broad domain of social intelligence 
social competence if one prefers to re 
“intelligence” for more traditional 
gether with several standard it 
ability tests. It was then possible to lo 
empirical evidence of a coherent do 
using several related techniques de 
below and to examine its relationship 
ademic intelligence, if warranted. 
The first step was the selection of 
measures, which in an undefined area St 
as this was somewhat difficult. ee € 
teria were employed for selecting the st 
intelligence measures: (a) location wit 
the conceptual domain of social inte 
(b) existence of reasonable validational 
dence, and (c) objective scorability 
liability. Further constraints were 
representation of the domain and 
matic concerns of time and number of 
sures. A longer tradition of rese: 
standard cognitive ability tests made 
tion of a representative group of these 
er. i 
Three lines of correlational analysis 
pursued with the resulting data, in ea 
which it was possible for the under 
structure of a social intelligence dom 
be revealed. The first is a conve! 
discriminant analysis, that is, DO 
within-domain measures correlate 
with each other than their average? 
correlation with measures outside th 
main? The second is factor analysis, 
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Do the social intelligence measures load on 
asingle factor different from the one or more 
factors within a standard cognitive ability 
domain? The third is criterion prediction, 
that is, Are social intelligence measures good 
at predicting some reasonable criterion of 
social maturity or competent social func- 
tioning? If positive answers were obtained 
for one or more of these questions, the fur- 
ther analysis of the relationship between the 
two domains could be carried out. 


Method 


Subjects 


The participants in this study were 117 college stu- 
dents, 63 of whom were women and 54 men, They were 
predominantly juniors at a large midwestern university 
who were recruited in a psychology class. The average 
age for the group was just over 20 years. All partici- 
pants were unaware of the hypotheses of the study, and 
none could recall having taken any of the tests used in 
the study previously. The means, standard deviations, 
and intercorrelations on the relevant measures for the 
men and women separately were not substantially dif- 
ferent, and they were thus combined into a single sam- 
ple for the analyses reported. Besides availability, 
college students were appropriate because of the find- 
ings Dye and Very (1968) and others have reported, 
which suggest that the differentiation of abilities ac- 
telerates during early adolescence and is completed by 
late adolescence. This sample, then, provides the 
maximum opportunity for studying different domains 
of abilities. 


Measures 


Academic intelligence domain. Three measures 
designed to tap a broad range of cognitive abilities 
Presumably related to traditionally intellectual ac- 
complishments were used: vocabulary and verbal 
Teasoning, nonverbal or abstract reasoning, and asso- 
“ative thinking or word fluency. 

1. The Concept Mastery Test (CMT; Terman, 1950) 
has two parts: vocabulary concepts (requiring the de- 
cision whether two words are synonyms or antonyms) 
and verbal reasoning (choosing the correct completion 
of a verbal analogy). In the analyses reported below, 

‘arts 1 (CMT 1) and 2 (CMT 2) are used as separate 
Scores, since they tap somewhat different abilities. The 
fnalyses were also carried out with the total CMT Stm 

owever, and this produced no difference in any ofthe 
Tesults, 

1 2. The Standard Progressive Matrices (SPM; Raven, 
960) was originally designed as a pure measure of "g 
(General intelligence) but is more appropriately con- 
sidered a test of nonverbal reasoning ability. Tt requires 

choosing the correct. figure to complete a matrix. 
3. The Remote Associates Test (RAT: Mednick & 
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Mednick, 1967) was designed to provide a measure of 
associative thinking (i.e., making correct connections 
among concepts that are associated but not logically 
related). This is the least validated of the measures in 
this domain, but seems to tap a relevant ability (i.e., 
word fluency) not included in the other measures. 

Social intelligence domain. Widely different in- 
terpretations are possible regarding the proper con- 
ceptual delineation of this domain. The attempt was 
thus to sample as representative a group of validated 
measures as possible: formal reasoning about the re- 
lationship of the individual to society and to some basic 
elements of societal functioning (i.e., moral or ethical 
reasoning), interpersonal reasoning or insight (i.e., ap- 
propriate choices in a situation involving interpersonal 
conflict), and competent social functioning (i.e., the 
ability to act appropriately in social settings as judged 
by other individuals). 

1. The Defining Issues Test (DIT; Rest, Note 1) is an 
objectively scorable measure of Kohlberg's (Note 2) 
stage theory of the development of moral reasoning. 
Considerable validational evidence is available for the 
DIT, unlike Kohlberg’s Moral Judgment Interview, 
which has been criticized by Kurtines and Greif (1974) 
for its poor psychometric properties. 

2. The Social Insight Test (SIT; Chapin, 1942) isa ` 
multiple-choice test that presents a series of social di- 
lemmas and asks the respondent to select the best al- 
ternative for dealing with the problem. Tt was empiri- 
cally keyed using criterion groups judged as socially 
effective or not effective and subsequently validated on 
other similar criterion groups (Gough, 1968). 

3, The Social Maturity Index (SMI; Gough, 1966) 
was derived in a slightly different fashion from those 
tests described above. It is a regression equation to 
predict effective social functioning as judged by peer 
groups. It uses scales from the California Psychological 
Inventory (Gough, 1969), which is a personality test 
rather than a cognitively oriented measure. It is in- 
tended as an estimate of the behavioral end of the di- 
mension. Because of the SMI’s somewhat different 
method of measurement, correlational results both in- 
cluding and excluding it are reported. For the analysis 
involving the prediction of a criterion, it is used as the 
criterion measure of social skill. High scores on the 
SMI are described as tactful, reliable, capable, and 
foresighted, whereas low scores are described as rude, 
coarse, defensive, and obnoxious (Gough, 1966, pp. 


194-195). 


Procedure 


h of these six measures was administered. ina 
s in 6 days 1 week apart. No time limits 


1-hour session 0 . 
were set, and little difficulty due to time pressure was 


reported. 


Results 


The scores of the students in this sample 
were analyzed to determine if they were a 
representative sample of college students. 
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Table 1 
Means and Standard Deviations on All 
Measures 


College 
Current norm 
Sample groups 
Test M SD M 
Academic 
Concept Mastery Test 59.6 28.4 73.0% 
Standard Progressive 
Matrices 544 44 — 
Remote Associates Test 15,2 4.3 16.0 
Social 
Defining Issues Test 5425 18.1 54.9 
Social Insight Test 23.2 49 25.3 
Social Maturity Index 51.8 32 50.8 


* This is the mean for college graduates. 

b This is the P score from the Defining Issues Test, that is, the 
percentage of principled moral judgment statements receiving 
preferential endorsement. 


On the four measures for which college stu- 
dent norms were available (DIT, RAT, SIT, 
and SMI), there were no significant differ- 
ences between this sample and those norms. 
Explicit norms for this group were not 
available on the CMT and SPM, but the 
norms that were available accorded well with 
the belief that this was a representative 
group of college students. Means, standard 
deviations, and college student means, where 
available, are reported in Table 1. 


Convergent-Discriminant Evidence 


The first analysis compared the intercor- 
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relations of the social intelligence measures 
to their average intercorrelation (using 
Fisher's z transformation) with academic 
intelligence measures. Excluding SMI, the 
DIT-SIT correlation was .29 (all rs are pos- 
itive). The average intercorrelation of DIT 
and SIT with the CMT, SPM, and RAT was 
.25. Including SMI, the within-domain av- 
erage r was .16, and the across-domain av- 
erage r was .23, All the rs are positive and 
significant, but it is clear that the within- 
domain correlations are no greater than the 
across-domain correlations. 

A potentially confounding factor in these 
comparisons is the differential reliability of 
the measures. In Table 2, the correlations 
among all measures, their internal consis- 
tency reliabilities, and the correlations cor- 
rected for measurement error are shown. 
The reliabilities are adequate with the ex- 
ception of the DIT. This measure has fewer 
items than the others, thus lowering its in- 
ternal consistency reliability, but other 
studies have found higher test-retest sta- 
bilities—around +.80 (see Rest, Note 1): 
The lower reliability obviously inflates the 
corrected correlations involving the DIT, but 
the within- versus across-domain compati- 
sons are essentially unchanged. Using the 
corrected correlations, the DIT-SIT r is 48, 
and the average across-domain r is .36. ‘The 
former is now slightly higher, but there is n0 
major difference. Including all three social 
measures, the average within-domain cor 


Table 2 
Correlations Among All Measures in the Study —_ 
Test 1 2 3 4 5 6 T 

1. CMT 1 [.95] 46** DU 27 32** 29% 20° 
2, CMT 2 (49) [94] 39** 36** 35** 30** 18° 
3. SPM (.24) (.34) [.78] 35** 07 33** 8 
4. RAT (.33) (32) (34) [71] 16 .18* 20 
5. SIT (.36) (.36) (.09) (.21) [.84] 39** 09 
6. DIT (45) (46) (57) (33) (48) [43] de 
7. SMI (.25) (.29) (12) (.21) 


Note. Above-diagonal data are raw correlations. Below-diagonal data are correlations in parentheses corrected for ben 9 
due to measurement error. Bracketed values on the diagonal are internal consistency reliabilities for each measure a: P 


this sample. CMT 1 = Concept Mastery Test, Part 1 (vocabulary); CMT 2 = Concept Mastery Test, Part 
= Standard Progressive Matrices (nonverbal reasoning); RAT = Remote Associates Test (verbal fluency); SIT = 


logies); 5! 
2 (aan Insight 


Test; DIT = Defining Issues Test (moral judgment); SMI = Social Maturity Index. ing the 
a This is a lower bound estimate using the Spearman-Brown prophecy formula, with three sections of the DIT compos 
test battery. The DIT rating scheme lends itself better to this internal consistency estimate (Stanley, 1971). 


* p <.05. 
** p <.001. 
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Unrotated Principal Factors, Communalities, and Rotated Factors of the Variables in the Study 


Table 3 
Principal factors 
Test hee Factor 1 Factor 2 

CMT 1 .28 63 —.14 
CMT 2 31 66 -40 
SPM 19 AT Al 
RAT 14 ESI 44 
SIT 19 48 —.34 
DIT 21 .50 .05 
SMI .08 .30 A3 

% variance 25.9 5.0 


Varimax rotation factors 


h? ‘actor 1 ‘actor 2 
42 (46) 56 (.57) «32 (36) 
45 (.48) 56 (.58) .37 (38) 
.39 (.51) .08 (.09) 62 (71) 
«21 (.29) «23 (.27) .39 (46) 
«34 (40) 58 (.63) 07 (.08) 
26 (58) ' «34 (52) :37 (56) 
11 C17) J4 (17) 30637) ' 

16.5 (20.6) 14.4 (20.7) 


Note. Values in parentheses are communalities and loadings corrected for measurement unreliability. CMT 

1 = Concept Mastet 
Test, Part 1 (vocabulary); CMT 2 = Concept Mastery Test, Part 2 (analogies); SPM = Standard Prope Matrices (eid 
reasoning); RAT = Remote Associates Test (verbal fluency); SIT = Social Insight Test; DIT = Defining Issues Test (moral judg- 


ment); SMI = Social Maturity Index, 
‘These are communalities based on principal components. 


telation is .28, and the average across-do- 
main correlation is .33.. Thus, there is little 
evidence in either the raw or the corrected 
correlations for the discriminant validity of 
àsocial intelligence domain. 


Factor Analytic Evidence 


A principal-axes factor analysis with it- 
erations! was carried out on seven variables, 
including two parts of the CMT. The ei- 
genvalues of the first two principal compo- 
nents were 2.46 and 1.04, accounting for 
35.396 and 14.896 of the variance, respec- 
lively. Given these eigenvalues, a two-factor 
Solution was selected. "The two principal 
factors and their communalities, and a var- 
imax rotation of those factors, are shown in 
Table3. The results are clear-cut. Factor 
lis defined by CMT and SIT, and Factor 2 
is primarily defined by SPM, with moderate 
loadings on all others except SIT. Several 
aspects of Table 3 are worth noting. First, 
A separate factor of social reasoning or soci 
intelligence does not appear in the analysis. 

econd, only slightly more of the variance of 
the academic measures, except RAT, is ac- 
counted for by Factors 1 and 2 (SPM, 3996; 

MT 1, 42%; and CMT 2, 45%) than for the 

3ocial reasoning measures (DIT, 2696; SIT, 


E 
"All factor anal d the multiple regression 
lyses ani ie multip! o 
‘nalysis used the Statistical Package for the Social 
iences (SPSS; Nie et al., 1975). The PA? (principal 
‘Actoring with iteration) option was used for the factor 
Analyses, 


3495; and SMI, 11%). The communalities 
from the principal components are even 
more similar. Third, a correction of the 
factor loadings for measurement unreliabi- 
lity again changes the overall picture little, 
except that the amount of moral judgment 
variance potentially accounted for by these 
factors is difficult to estimate. Finally, the 
SMI has the lowest communality, perhaps 
because it is more behaviorally (as opposed 


to cognitively) oriented. 


Prediction of Criterion Evidence 


In this analysis, the SMI was used as the 
criterion because it was the best estimate of 
the behavioral component of social skill and 
was validated in previous research as such, 


A stepwise multiple of SMI on the 
Jes was carried out using 


remaining six variabl 
the liberal inclusion limitation of the SPSS 
in 


program. Four variables were selected 
the following order: RAT, CMT 1, SPM, 
and CMT 2. The multiple R of these four 
variables with the criterion was .28, which 
accounts for only 8% of the SMI variance. 
Neither of the two social domain variables 
(DIT and SIT) were selected, and both of 
these had nonsignificant zero-order rs with 
the SMI (.11 and .09, respectively). 


Discussion 


This research demonstrated that the pu- 
tative domain of social intelligence lacks 
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empirical coherency, at least as it is repre- 
sented by the measures used here. Two 
reasons might be advanced to account for 
these findings. The first trivializes them by 
noting that these perhaps were not good 
measures or did not come from the domain 
intended by the term social intelligence. It 
should be noted, however, that they are as 
well validated as any other similar measures 
(and better than most) and that the de- 
scriptions and validation efforts for each of 
them is consistent with a reasonable inter- 
pretation of social competence or intelli- 
gence. 

The second reason is more interesting. 

Consider that, although a single coherent 
social domain did not emerge, the portion of 
unexplained variance for each of the social 
measures was somewhat lower than for the 
more traditional academic intelligence 
measures. This is especially true for the 
SMI. It may be then that careful research 
attempting to define the domain more 
clearly, and to specify the skills and abilities 
included more precisely, might eventually 
generate sufficient common variance to es- 
tablish empirically a coherent behavioral and 
cognitive domain. In particular, it may be 
that the very format of such measures (i.e., 
paper-and-pencil format with delimited re- 
sponse options) activates an academic 
framework so constraining that relevant 
social skills contribute little true variance to 
the resulting scores. Accurate assessment 
of social competence may require a different 
approach to measurement, presumably one 
that capitalizes on systemic in situ observa- 
tion. 

This study implies several cautions for 
research and practice. First, use of the 
construct of social competence or social in- 
telligence as if it were a clearly defined do- 
main seems unwise in the absence of con- 
firming evidence. What is intended by these 
terms might be a domain so fractionated that 
it makes no sense to generalize across dif- 
ferent skills, or it may be—despite expecta- 
tions to the contrary—a rather uninteresting 
extension of skills and abilities already 
tapped by standard academic ability mea- 
sures. Second, additional research into so- 
cial competence, both behavioral and cog- 
nitive, is clearly necessary. We know too 
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little about this putative domain to draw 
firm conclusions about its nature and 
structure. Third, educational innovations 
(i.e., new curricula and programs) designed 
to tap this dimension of ability, either to 
foster it directly or to use it to facilitate de- 
velopment in other areas (e.g., academic 
achievement) need, in the absence of sound 
evidence, to be explicit about the kinds of 
skills or abilities to be investigated and used 
and the specific outcomes expected. 


Reference Notes 


1. Rest, J. R. Manual for the Defining Issues Test. 
Unpublished manuscript, University of Minnesota, 
1974. 

2. Kohlberg, L. Issue-scoring guide. Unpublished 
manuscript, Harvard University, June 1972. 


References 


Chapin, F.S. Preliminary standardization of a social 
insight scale. American Sociological Review, 1942, 
7, 214-225. À 

Dye, N. W., & Very, P.S. Growth changes in factorial 
structure by age and sex. Genetic Psychology 
Monographs, 1968, 78, 55-88. t 

Flapan, D. Children's understanding of social inter- 
action. New York: Teachers College Press, 1968. 

Flavell, J. H., Botkin, P. T., Fry, C. L., Jr., Wright, J. W» 
& Jarvis, P. E. The development of role-taking d! 
communication skills in children. New York 
Wiley, 1968. í 

Gough, H. G. Appraisal of social maturity by means 0 
the CPI. Journal of Abnormal Psychology, 1966, 
71, 189-195. f 

Gough, H. G. Manual for the Chapin Social Insight 
Test. Palo Alto, Calif.: Consulting Psychologists 
Press, 1968. al 

Gough, H.G. Manual for the California Psychologies 
Inventory. Palo Alto, Calif.: Consulting Psychol- 
ogists Press, 1969. 1 

Kurtines, W., & Greif, E. The development of mor? 
thought: A review and evaluation of Kohlberg‘ 
approach. Psychological Bulletin, 1974, 81, 45 
470. 

Mednick, S. A., & Mednick, M. T. Manual for tit 
Remote Associates Test. Boston: Houghton Mi 
flin, 1967. SeT 

Neisser, V. General, academic, and artificial intei 
gence. In L. Resnick (Ed.), The nature of inte 
gence. Hillsdale, N.J.: Erlbaum, 1976. 1 

Nie, N. H., Hull, C. H., Jenkins, J. G., Steinbrenner, ial 
& Bent, D. H. Statistical package for the si , 
sciences (2nd ed.). New York: McGraw- 
1975. P Yir 

Raven, J.C. Guide to the Standard Progressive 
trices. London: H.K. Lewis, 1960. | fde v B 

Selman, R. L. Toward a structural analysis © 


‘Stanley, J.C. Reliability. In R. L. Thorndike (Ed.), 


A SEARCH FOR SOCIAL INTELLIGENCE 


oping interpersonal relations concepts: Research Educational measurement (2nd ed.). Washington, 


with normal and disturbed preadolescent boys. In D.C.: American Council on Education, 1971. 


A, D. Pick (Ed.), Minnesota symposia on child Terman, L. M. Concept Mastery Test. New York: 


psychology (Vol. 10). Minneapolis: University of Psychological Corporation, 1950. 
Minnesota Press, 1976. 


Notice to Authors on Nonsexist Language 


This j rts the idea that sexism in language is avoidable. Authors are 
Ea tees the “Guidelines for Nonsexist Language in cake Joona 

(Publication Manual Change Sheet 2, American Psychologist, une E h pp. 
487-494) before submitting manuscripts to this journal. pa: m o 
Guidelines are also available by writing to Publication Mena: c a "deere 
American Psychological Association, 1200 Seventeenth Street, N.W., i 


D. C. 20036. 


Received April25,1977 m 


Journal of Educational Psychology 
1978, Vol. 70, No. 2, 224-230 


Department of Human Development 


Self-Pacing Versus Instructor-Pacing: 
Achievement, Evaluations, and Retention 


Colleen F. Surber 
University of Illinois 
at Urbana-Champaign 


Edward K. Morris 


and Family Life 
University of Kansas 


Sidney W. Bijou 


University of Arizona 


Student procrastination is an important concern in personalized systems of 
instruction. This study compared progress made on course work by two 
groups of students—one self-paced (n = 75) and one instructor-paced (n = 
74)—along with measures of course achievement and evaluations and a 9- 
month content retention test. Results showed that even though the self- 
paced group procrastinated, while the instructor-paced group did not, both 
scored similarly on pre-, post-, and retention tests and were equally satisfied 
with the course. Moreover, no differences were found in the number of units 
completed, final grade distributions, or course withdrawal rates. The with- 
drawal rate data and the tendency for the self-paced group to score better on 


the retention test are discussed in terms of educational objectives. 


For educators and students alike, the 
self-pacing component of personalized sys- 
tems of instruction (PSI) has traditionally 
been one of its most popular features (Car- 
roll, 1963; Keller, 1968; Kulik, Kulik, & 
Carmichael, 1974; Sherman, 1974; White- 
hurst & Whitehurst, 1975; Lloyd, Note 1). 
Self-pacing allows students to work at their 
own rates and plan for competing assign- 
ments from other courses. In the end, it 
serves to promote uniform subject mas- 
tery. 

Recent research, however, indicates that 
self-pacing may not be one of the necessary 
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conditions outlined by Keller (1968) that 
promotes effective learning (Bijou, Morris, 
& Parsons, 1976; Bitgood & Segrave, 1914; 
Burt, 1975; Lloyd, 1971; Lloyd & Knutzen, 
1969; Mawhinney et al., 1971; Miller, 
Weaver, & Semb, 1974; Semb, Conyers, 
Spencer, & Sosa, 1974; Sutterer & Holloway, 
1974). For example, within PSI courses 
students who are permitted to self-pace are 
more likely to withdraw than those who are 
under instructor-paced contingencies e£ 
Semb et al., 1974). Moreover, in self-pac 
PSI courses (a) rates of test taking decline 
over a semester until students cram towa 
the end (Atkins & Lockhart, 1976; bes 
1975; Lloyd & Knutzen, 1969; Lloy® 
McMullin, & Fox, in press; Mawhinney et i 
1971; Robin & Graham, 1974); (b) unde 
graduate teaching assistants are inefficient? 
used; (c) study centers become overcrow ri 
and (d) mastery criteria sometimes de io 
orate (Semb et al., 1974; Sutterer & Ho 
way, 1974). m. 
To counteract these problems, Ee F 
structors have implemented require i: 
structor-paced schedules for student Fog! 
ress (Lloyd, 1971; Malott & Svinicki, 
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ler et al., 1974; Stalling, 1971; Sutterer & 
lloway, 1974), while others have intro- 
i more flexible combinations of in- 
structor- and student-paced point systems 
(Bijou et al., 1976; Bitgood & Segrave, 1974; 
Burt, 1975; Powers, Edwards, & Hoehle, 
1973; Semb et al., 1974). Both types of 
pacing systems have been found effective in 
ucing student procrastination and with- 
drawal (Semb et al., 1974). However, we 
have few comparisons between self-pacing 
and Instructor-pacing in terms of student 
achievement. The available data indicate 
that neither learning (Atkins & Lockhart, 
| 1976; Bitgood & Segrave, 1974; Burt, 1975; 
| Lloyd et al., in press; Robin & Graham, 1974; 
Semb et al., 1974) nor course satisfaction 
(Bitgood & Segrave, 1974; Robin & Graham, 
1974; Semb et al., 1974) is affected by 
Whether students self-pace or meet an in- 
] structor’s pacing requirements. Given that 
| no differences are found, the logistics of 
teaching assistant workloads and efficient, 
effective student management seem to favor 
‘the use of instructor-paced teaching sys- 
tems. 
Despite the equivalence of final exami- 
mation scores and course evaluations, little 
S known about retention of material fol- 
Owing course completion. A few studies 
ve shown that students learning under a 
I system retain knowledge repertoires 
ter than those learning under a lecture- 
‘cussion format (Cole, Martin, & Vincent, 
1974; Cooper & Greiner, 1971; Corey & 
McMichael, 1974; Moore, Hauck, & Gagné, 
973), but these differences may reflect dif- 
ntial content acquisition more than they 
differential retention per se (Lloyd, 


Only one study (Robin & Graham, 1974) 
been reported that compares the content 
ntion by students whose pacing is evenly 
ulated by pacing contingencies with 
tent retention by students who self-pace. 
he Robin and Graham study, no reten- 
On differences were found; however, in- 
retation of the results was complicated 
tudent self-selection, small comparison 
Ups, differential withdrawal rates, and a 
ntion interval of only 3 weeks. Thus, the 
nt study was designed both to replicate 
arch comparing self-paced to flexible 
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instructor-paced approaches and to extend 
these findings to a more meaningful follow- 
up assessment of content retention. 


Method 


Subjects 


One hundred forty-nine students were enrolled in a 
PSI section of an introductory child development 
course. Freshmen, sophomores, juniors, and seniors 
were randomly assigned to the self-paced and instruc- 
tor-paced contingencies. Seventy-five students were 
placed in the former group and 74 in the latter. 


General Procedures 


Course materials? were divided into 15 units of ap- 
proximately equal size, 1 for each week of the semester. 
At the completion of each unit’s assignment, students 
came individually to a study center where they were 
required to pass a 10-item, short-answer essay quiz and 
an oral examination; both were graded by undergrad- 
uate teaching assistants. Ninety percent mastery was 
required. If one question was missed, it was included 
on the quiz for the next unit; if more than one question 
was missed, a make-up quiz was required. 


Experimental Manipulations 


The course syllabi given to the students in the self- 
paced and instructor-paced groups were identical except 
for the section describing grading procedures. 

Self-paced condition. Students assigned to the 
self-paced group were permitted to complete the course 
at their own rate within the semester's time. A stu- 
dent’s final grade was based solely on the number of 
units mastered: 15 units = A, 14 units = B, 13 units = 
C. and 12 units or less = F. 

Instructor-paced condition. Students assigned to 
the instructor-paced group worked within a flexible 
point system. As with students in the self-paced group, 
they could proceed as quickly as they desired; the 
number of units completed had the same relationship 
to their final grade, However, these students also had 
to meet a point criterion which generally required them 
to master at least one unit of material each week. 
Failure to meet this criterion resulted in a one-letter 
drop in grade (i.e., from A to B, B to C, etc.). Com- 


1 See Bijou et al. (1976) for a more detailed descrip- 
tion of course management procedures, especially in 
regard to the pacing system. 

2 Course materials included Child development: The 
basic stage of early childhood (Bijou, 1976); Child de- 
velopment I: A systematic and empirical theory 
(Bijou & Baer, 1961); Child development II: Universal 
stage of infancy (Bijou & Baer, 1965); Child develop- 
ment: Readings in experimental analysis (Bijou & 
Baer, 1967); and Course guide (Bijou, Note 2). 
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Table 1 
Instructor-Pacing Point Schedule 


Mon- Tues- Wednes- Thurs- Fri- 


Variable day day, day day day 


Points for quiz 

x in week x 10 10 9 8 8 
Points for quiz 

xinweekx+1 5 = — — = 


pleting each unit by the Thursday or Friday of its re- 
spective week earned the students enough points to 
meet the point criterion; however, the system was 
flexible in that more points could be earned for passing 
unit quizzes earlier in a week (see Table 1). Thus, the 
student who fell behind had some opportunity to make 
up points by mastering subsequent units on Mondays, 
Tuesdays, or Wednesdays of the following weeks. Ifa 
student failed to acquire the requisite number of points, 
the grade could still be maintained by completing a term 
paper of A quality with one opportunity for a revi- 
sion. 


Achievement Measures 


The primary achievement measure was student 
performance on a 53-item multiple-choice test. Three 
or four items from all but the first unit were included 
and randomly ordered. Because the weekly unit quiz 
was of the short-answer, essay variety, none of the 
questions on the criterion test was the same, even 
though the material covered was identical. 

The multiple-choice test was administered as a pre- 
test during the students’ first visit to the study center 
when they turned in their quiz for the first unit, a take 
home. A rerandomized posttest was administered 
immediately after each student’s completion of the last 
unit of the course. The students were informed each 
time that their performance on the test would not affect 
their final grade. 

A course evaluation questionnaire was also admin- 
istered upon completion of the course. Questions were 
designed to cover the students’ (a) satisfaction with 
specifics of course organization (i.e., the oral and written 
quizzes, grading procedures, teaching assistants, and 
optional weekly lectures) and (b) general reactions to 
the course. 

Nine months following completion of the semester, 
all students were contacted by mail and offered $2 to 
take a follow-up test on which items from the original 
were again rerandomized. In addition to this monetary 
inducement, when the students arrived, they were in- 
formed that they could earn an extra 2€ for each ques- 
tion answered correctly. 


Results 


Of the 149 students originally enrolled in 
the course, 127 received final grades at the 
end of the semester, 63 in the self-paced 
group and 64 in the instructor-paced group. 
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Three students had to be given incompletes 
for medical excuses (two from the self-paced 
group and one from the instructor-paced* , 
group) while 19 dropped the course. How- 
ever, there was no difference in course 
withdrawal rate between the self-paced and 
instructor-paced groups, as they lost 9 
(12.8%) and 10 (13.796) students, respec- 
tively. Nor were there any differences in the 
final grade distributions: over 9096 of the 
students in both groups received an A. At 
the end of the course, the average number of 
units completed by each group was virtually 
thesame: self-paced group = 14.77 and in- 
structor-paced = 14.95. Finally, the two 
groups performed almost identically on the 
53-item pretest. The mean number of cor- 
rect items for the self-paced group was 20.5, 
while it was 20.3 for the instructor-paced 
group. 

There was, however, a statistically sig- 
nificant difference between the two groups 
in terms of the number of quizzes repeated 
over the course of the semester. Students in 
the self-paced group had to repeat 4.1% of 
their quizzes, whereas those in the instruc- 
tor-paced group had to repeat 7.2% of theirs, 
x?(1) = 8.75, p <.01. 


Pacing 


To compare the rates of progress of the 
two groups, the semester was divided into 
15-day periods. The average number 9 
units completed during each time period are 
shown in Figure 1. Only the first 4 of the} 
time periods were used in a 2 X 4 (Pacing x 
Time) analysis of variance. The rationa ° 
for this was that since very few students 
failed to complete all 15 units, the informa 
tion in the fifth time period is redundar 
with the first 4 (i.e., the score for Time. l 
riod 5 = 15 — sum of scores for Time Pe 
1 through 4). This treatment of the E 
provides a conservative test of the par 
difference. The results showed a signifi E 
main effect for pacing group, F(1, 129/ d 
7.86, p < .01, and a significant Group x Zu 
Period interaction, F(3, 375) = 5.30, p E p 
The significant interaction of pacing a f 
and time period shows that the patter the 
progress of the two groups 


throug’ 4 


SELF-PACED VERSUS INSTRUCTOR-PACED GROUPS 


45 
Self Paced Om 
Instructor Pacedé— —e 
40 T 
" 
2 
z 
35 2 ví 
v ut 
o o 
8 ih 
a w - 
E 9: 
5] r 
o a ^ 
2 25 se 4 
E A E 
X o - 
20. 
0 - T + LI 
i b 3 i 5 
Time Periods 
Figure 1, Mean number of units completed by the 5 


15-day time periods of the course for the self-paced and 
Instructor-paced groups. 


course were not the same, as predicted by the 
contingencies. 


Posttest Achievement Measure 


The mean scores on the posttest were 32.6 
for the self-paced group and 32.2 for the in- 
structor-paced group. When compared with 
the pretest scores, an analysis of variance 
showed that the main effect of the pretest- 
posttest was significant, F(1, 124) = 412.9, 
P < .001, while neither the main effect of the 
treatment (experimental conditions) nor the 
Treatment x Pretest-Posttest interaction 
approached significance. The performances 
of the two groups on the multiple-choice 
Criterion tests increased significantly from 
the pretest to the posttest, but the two 
stoups did not differ in their scores on either. 
n other words, it made no difference 
Whether the students self-paced and pro- 
*tastinated or whether they worked at an 
‘ven rate under point incentives; they scored 
\entically on the posttest achievement 
Measure, 


Course Evaluations 


The course evaluation questionnaires 
“ompleted at the end of the course yielded no 
"'erences in satisfaction on the dimensions 
*Valuated. Both groups were equally posi- 
'* about the course, as all ratings range 
tom 2.90 and 3.45 on a 4-point scale. 


227 
Retention Achievement Measures 


Data were collected on 51 (40.2%) of the 
127 students who received final grades in the 
course, 27 (42.2%) from the self-paced group 
and 24 (38.1%) from the instructor-paced 
group. Figure 2 shows the retention test 
scores for these samples along with their 
pretest and posttest scores. The latter 
scores for these follow-up students were 
similar to those of their groups as a whole. 
Moreover, the final grade distributions of the 
two follow-up groups also matched those of 
their respective groups. 

Analysis of variance indicated a strong test 
(pre-, post-, retention) main effect, F(2, 98) 
= 114.9, p <.001, but that the Group X Text 
interaction was not statistically significant, 
F(2, 98) = 2.42. However, the p value for 
the Group X Test interaction was .094 and 
suggested that the self-paced group per- 
formed somewhat better on the retention 
test than did the instructor-paced students. 
Finally, no relationships of any merit were 
found between the retention scores and 
posttest scores or the retention scores and 
procrastination. 


Discussion 


A comparison between a student self- 
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Figure 2. Pretest, posttest, and retention achievement 
scores for the follow-up samples from the self-paced 
(S-P) and instructor-paced (I-P) groups. 
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paced instruction system and a flexible, in- 
structor-paced point system revealed that 
students procrastinated when they self- 
paced, yet proceeded evenly through course 
material when given incentives to do so. As 
noted elsewhere, students do not self-pace; 
they pace according to the conditions that 
control the pacing behavior (Bijou et al., 
1976). In most cases when we say a student 
self-paces, we are admitting that the condi- 
tions which produce pacing are unknown. 
These results are also in agreement with 
other research demonstrating that whether 
students self-pace or have their pacing reg- 
ulated, they score similarly on criterion 
measures of course achievement (Atkins & 
Lockhart, 1976; Bitgood & Segrave, 1974; 
Burt, 1975; Lloyd et al., in press; Robin & 
Graham, 1974; Semb et al., 1974) and are 
highly and equally satisfied with the ways in 
which they were instructed (Bitgood & Se- 
grave, 1974; Robin & Graham, 1974; Semb et 
al., 1974). 

In addition to the similarity of course 
achievement and course satisfaction mea- 
sures, the two groups showed no differences 
in (a) the number of units completed, (b) 
final grade distributions, or (c) course with- 
drawal rates. However, the self-paced group 
did have to repeat fewer quizzes, and this 
would seem to be an advantage. Why this 
difference occurred is not immediately ap- 
parent, but it could be that students took 
quizzes when they were prepared, rather 
than being forced to take them at the end of 
a week, prepared or not. 

The data indicating that the self-pacing 
component had: no differential effect on 
course withdrawal are particularly inter- 
esting. The PSI courses generally have 
higher withdrawal rates than lecture-dis- 
cussion courses and the presumption is often 
that the self-pacing feature is the cause; 
however, there is only a small amount of 
empirical evidence for this latter claim (see 
Semb et al., 1974). Certainly the matter is 
not settled; the wide procedural variations 
from one PSI course to the next preclude a 
final answer. But the point can be made 
that self-pacing need not lead to greater 
student withdrawal. When it does, the 
other components of the PSI package should 
be scrutinized; indeed, they may be inter- 
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acting with the self-pacing component, 
thereby inducing high withdrawal rates. 
This need not occur. 

Although the results show striking simi- 
larities between the two groups on depen- 
dent measures relevant to educational 
achievement, many educators would still be 
troubled over student procrastination in the 
self-paced group. Course management lo- 
gistics aside, cramming typically has been 
considered less desirable than regularly 
paced study; however, there are as yet no 
supportive data in the PSI literature for this 
conclusion (Burt, 1975). "Therefore, the 
inclusion of a follow-up retention measure 
was a logical step for assessing possible dif- 
ferences. But no statistical retention dif- 
ferences were apparent. If anything, the 
data suggested that the benefit might go to 
the procrastinating self-paced group. Per- 
haps contingencies that allow self-pacing are 
important after all. 

In addition to this retention trend, the 
difference between the two groups in quiz 
repeat rates was in favor of the self-paced 
group; they failed significantly fewer quizzes: 
But when all other achievement measures 
are the same, it is difficult to know how to 
interpret a high or low repeat rate. One 
conclusion is that more students in the in- 
structor-paced group took quizzes before 
being adequately prepared. However, the 
supposedly aversive event of quiz failure di 
not seem to influence the course evaluation 
measures. On the other hand, it might be 
suggested that we did not measure the ap- 
propriate behaviors. Perhaps instructor- 
pacing and the quiz repeats are teaching 
students something else, something unre- 
lated to achievement. Perhaps they até 
teaching pacing skills that will be more ee 
portant to future learning than the conten! 
of any single course. 

Future PSI research should attempt 
determine whether pacing skills, once er : 
quired in a course, will then be applied 5 t 
subsequent courses. However, 4 cave 
needs to be entered. Analogous to the 
ficulties of generalization from clinical x d 
educational programs, pacing skills sho! 
not be expected to appear magically ae fot 
learning settings.: They must be plann tot 
and programmed. Perhaps instruc 


paced systems would be part of the program, 
perhaps not. But we should begin to find 
t. If instructor-paced systems are not 
of a learning-to-self-pace program, then 
‘we must examine the possible benefits of 
self-paced systems for content retention 
espite the course management problems 
ley generate. 


Reference Notes 
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Toward the Definition of a Domain of Academic Motivation 


Kenneth O. Doyle, Jr., and Ross E. Moen 


Measurement Services Center, University of Minnesota 


In the domain of academic motivation, 


distinctions such as statics versus dy- 


namics, rational versus empirical, and linear or “one shot” versus cyclic or 
“bootstrapping” help clarify the focus of previous studies and suggest guide- 
lines for future studies. Illustrative of bootstrapping in researching the struc- 
ture of academic motivation is the present study’s development of a 75-item 


self-report Academic Motivations Inve: 


ntory. In the current cycle of the de- 


velopment of this instrument, factor analyses of the inventory produced a set 
of nine interpretable dimensions which summarize a considerable portion of 


the domain and which can be measured 


with adequate reliability. This collec- 


tion of factors is discussed in relation to prior studies of the structure of the 


domain; further research toward the de 


, Knowledge about the academic motiva- 
tion of college students is potentially useful 
m admissions decisions, guidance and ad- 
Justment counseling, curriculum planning, 
Instructional evaluation and development, 
and basic educational research (Moen & 
Doyle, 1977). A scientific knowledge about 
academic motivation requires from the start 
a definition of the domain, so that measures 
tan be developed and models and theories 
derived. 
In defining such a domain, investigators 
‘an pursue knowledge about “dynamics” 
RU or “statics.” The study of dynamics 
volves “attempts to derive causal laws or 
*velopmental sequences” (Eysenck, 1970, 
E» Such as the effects of motivation on ac- 
puc progress and satisfaction or the 
Pace of students’ motivations. The study 
a ae, or structure, involves “attempts to 
Biete taxonomies, classifications, or no- 
: dus ' (Eysenck, 1970, p. x) which would 
à erentiate among motivations such as the 
‘ire for good grades and the desire for 
"Improvement. 
om, the pursuit of these kinds of knowledge, 
an Investigators have emphasized largely 
Nro procedures and others have fo- 
The 9n principally “empirical” procedures. 
rational approach stresses “the impor- 


i," Measurement Services Center, University 
Min nnesota, 9 Clarence Avenue, S.E., Minneapolis, 
Mesota 55414. 


quens for reprints should be sent to Kenneth O. 
Li 


velopment of the domain is specified. 


tance of theory, of rationally defining con- 
structs, and of devising items in relation to 
one's theory" (Riechmann & Grasha, 1974, 
p. 214; see also Jackson, 1971). The empir- 
ical approach deals in known statistical and 
psychometric properties of the elements— 
the items and scales—which define the do- 
main (e.g., Shack, 1970; see also Dunnette, 
1966). These considerations suggest a 
four-fold table—statics/dynamics by ra- 
tional/empirical—which can be used to 
clarify the focus of previous studies and to 
guide continuing investigation. 

Some prior studies have dealt within a 
single cellofthistable. Mandler and Sara- 
son's (1952) Test Anxiety Questionnaire, for 
example, is a product of a rational-dynamic 
study, while Brown and Holtzman's (1953) 
initial development of the Survey of Study 
Habits and Attitudes as a predictor of un- 
derachievement illustrates the empirical- 
dynamic approach. Riechmann and Grasha 
(1974) used the rational approach to inves- 
tigate the structure of a motivational do- 
main, while Cohen and Guthrie (1966) 
studied statics empirically. 

Most studies, however, have dealt with 
two or more of the cells. The many inves- 
tigations focusing on a few select motivations 
such as curiosity (Berlyne, 1954), anxiety 
(Gaudry & Spielberger, 1971), and achieve- 
ment, affiliation, and power (McKeachie, 
1961) have typically used both rational and 
empirical methods to study the dynamics of 
motivations. Similarly, Frymier (1970) used 
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both methodologies in his study of general 
motivational dynamics. A small number of 
studies such as Chiu’s (1969) and Shack’s 
(1970) have drawn on both kinds of methods 
to study motivational structure. Some have 
used one method to study both aspects (e.g., 
Morstain, 1973), others have used one 
method to study structure and another to 
study dynamics (e.g., Borow, 1949; Brown & 
Holtzman, 1966); an occasional study has 
used both methods to study both aspects 
(e.g., Farquhar, 1963; Stern, 1970). 
When both rational and empirical 
methodologies are used to explore either 
structure or dynamics, the two methods can 
be combined either linearly or cyclically. 
The linear combination is illustrated by the 
“one-shot” data interpretation study, the 
hypothesis testing study, and the instrument 
development study, examples of each of 
which appear frequently in the academic 
motivations literature. The cyclical or it- 
erative combination of methodologies is 
similar to the “bootstraps” procedure de- 
scribed by Cronbach and Meehl (1955) in 
which a series of test-and-modify iterations 
can increase the meaningfulness and preci- 
sion of a construct. Weiner’s (1972) re- 
finements of achievement motivation and 
locus of control theory illustrate the cyclical 
combinings of rational and empirical 
methodologies which are not uncommon in 
research on the dynamics of specific aca- 
demic motivations. A few examples of it- 
erative combinations in statics studies can 
be found in other domains (e.g., Loevinger 
& Wessler, 1970; Tellegen, Note 1), but they 
are exceedingly rare in general studies of the 
structure of the academic motivations do- 
main. 

The present study is a major part of an 
on-going rational and empirical examination 
of the dynamics and structure of academic 
motivations. This report focuses on the 
cyclical, structural development of a self- 
report measure called the Academic Moti- 
vations Inventory. For reasons of space, the 
iterations cannot all be described in their 
entireties; instead, the most current cycle is 
described in detail and prior phases more 
briefly summarized. Earlier phases are 
more completely described in Moen and 
Doyle (1977, in press) and Moen (Note 2). 
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Method 


First Cycle 


The first cycle involved development of a model of the 

structure of academic motivations, an empirical study 
of the structure defined by that model, and revision of 
the model as a result of that empirical study. The 
model was derived by integrating the findings of prior 
investigators with new information drawn from exten- 
sive interviews with students, faculty, and counseling 
and clinical psychologists. A 176-item inventory con- 
structed on the basis of this model was subjected to item 
and covariation analyses. While these analyses tended 
to confirm the essential characteristics of the model, 
they also suggested certain revisions. Some elements 
of the model were deleted because the items they 
yielded had little discriminatory power. Other ele- 
ments were combined because high correlation among 
representative items suggested redundancy. Some 
elements were redefined because their items showed 
different correlational patterns than had been pre 
dicted. 


Second Cycle 


The second cycle involved developing a much shorter 
version of the inventory, 60 items, based on the revise 
model. Factor analyses of this instrument produc? 
eight interpretable factors which seemed congruent wi 
most of the principal dimensions of the revised mo 
but which also suggested further deleting, combining: 
and redefining of some of the model’s elements. ko 
tain theoretically important elements, however, sue 
as the needs for esteem and affiliation, failed to emer ur 
clearly from the factor analyses. "These omues 
seemed consistent with fully a third of the 60 Ms. 
failing to load substantially on any of the eight 
tors. 


Current Cycle 

The third and current cycle of research involve (a) i 
development of a third version of the Academ m " 
vations Inventory, with some modifications aa ele- 
reflect changes in the model and others to es rept: 
ments of the model which were insufficiently acto!) 
sented in the prior versions; (b) structur: tion off 
analysis of this instrument; and (c) examina seque 
reliability (homogeneity) of the factors. findings 
discussion compares these findings with the d explores 
other structural research in this domain s urther I 
the implications of these findings for 
search. 


Instrument 


ventory Wa 

For the third iteration, a 76er ine iil ite? 
veloped with some items tal en itten: 

pool, some rephrased, and some newly V7 and v 


item was cast as a simple declarative Sen' 
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accompanied by a 5-point Likert-type scale, “Not at all 
true of me" through “Extremely true of me.” 


Subjects 


The subjects were 626 students ina wide variety of 
courses at the University of Minnesota who completed 
the inventory at various times during the 1976-1977 
academic year. 


Procedure 


After having been advised of the purpose of the study 
and urged to participate thoughtfully, the students 
completed the inventory in class. They signed their 
names to their answer sheets but were promised strict 
Confidentiality of results. Class results were reported 
toinstructors as summary statistics, without individual 
attribution; one of the investigators went over the re- 
sults with the instructor. Often, the instructor and 
fentor discussed the summary results with the 
Class, 


Results 


Factor Analyses 


Product-moment correlations were com- 
buted between items across respondents. 
-9 Insure that the intercorrelation matrix 
fontained enough information for a mean- 
ieful factor analysis, Bartlett’s test for an 
entity matrix was computed (Bartlett, 
1850; Weiss, 1971). "The matrix was signif- 
tantly different from an identity matrix of 
same dimensions, x2(2775) = 12,256.30, 
: <0 01. The matrix was then submitted to 
principal factors analysis, with squared 
tile correlations as the initial commu- 
y. lY estimates, and then rotated to a vari- 
ax solution. 

A number of factors to be extracted was 
Í blished by a judgmental combination of 
E" Criteria: (a) the point at which ei- 
nates greater than 1.0 dropped dra- 
py ; (b) the point at which subsequent 
l) the Seemed to lose interpretability; and 
Xeon, Point at which the amount of variance 
that ped for by the factors approximated 
Mme. Tom a parallel factor analysis of a 
thre Sized matrix of random data (Hum- 
tite’ © Ilgen, 1969; Weiss, 1971). These 
lin, a Suggested the retention of nine fac- 
li » the tentative names for and descrip- 
of which appear in Table 1. The per- 
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centage of common variance accounted for 
by each of these factors is also given in Table 

Seven items failed to load above -80 on any 
of these factors. These items deal with low 
tolerance for dissonance and ambiguity and 
with the desires to be autonomous, to gain 
esteem, and to help others. These items 
suggest the possibility of additional factors 
still too unstable to emerge. 


Reliability Analyses 


The responses from the factor analysis 
sample were also used to compute internal 
consistency reliabilities for scales repre- 
senting each of the nine stable factors. 
Items loading .30 or more on a factor were 
unit weighted, regardless of factor loading, 
and analyzed as a scale for internal consis- 
tency. The alpha reliabilities for each of the 
nine scales also appear in Table 1. They 
range from .52 to .87, the size of the coeffi- 
cients generally corresponding to the num- 
ber of items in thescale. These reliabilities 
can be considered conservative because they 
could be increased by the addition of items 
and/or by the application of various re- 
sponse-weighting procedures (e.g., reciprocal 
averages weighting; see Weiss, 1976). 


Discussion 


Successive iteration of rational and em- 
pirical procedures produced a set of factors 
which summarize a considerable portion of 
the domain of academic motivations and 
which provide at least the promise of reliable 
measurement. Most of these factors are 
convergent with the findings of prior studies. 
"Three or four of these factors seem similar to 
dimensions identified by more than one 
other investigator. Dimensions similar to 
"Enjoyment of Learning" are Shack's (1970) 
Curiosity Indulgent factor, Morstain's (1973) 
Inquiry factor, Chiu's (1969) Curiosity fac- 
tor, and Riechmann and Grasha's (1974) 
Participant scale. “Anti-school” seems 
similar to Shack's Ambivalent factor, to 
Riechmann and Grasha's Avoidant scale, 
and to the inverse of Chiu's Positive Orien- 
tation to Learning factor. “Desire for Career 
Preparation" is similar to Shack's Future 
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Table 1 
Factors of Academic Motivation with Illustrative Items, Percentages of Common Variance, and 


Scale Reliabilities 
% Scale 


Illustrative highest-loading common internal 
Factor items (loadings) variance consistency 
I. Desire for Self I hope school will make me feel able to handle 33.8 87 
Improvement (15 items) whatever comes up (.78) 
I hope school will help me become a better person 


(.65) 
I hope school will teach me better ways of handling 
conflicts with people (.65) 


II. Anti-School (17 items) School gives me a feeling of insignificance thatlhate 22.7 .82 
C61) 
Grades or degree requirements put restrictions on me 
that I hate (.55) 
I dislike most schoolwork (.55) 


III. Desire for Esteem I worry that others might think something I do or 12.0 81 
(14 items) Say in class is stupid (.54) 
I hope what I learn in school will make others pay 
more attention to me (.52) 
I worry that others might think badly of me if it 
seems I don’t try to do my best in school (.52) 


IV. Enjoyment of Learn- It is very important that my classes make me use my 9.6 T 
ing (11 items) mind (.55) 
I enjoy reading most books and articles my teachers 
assign (.50) 
I just really enjoy learning new things (.45) 
V. Enjoyment of I enjoy matching wits with others in school (.60) 6.5 mn 
Assertive Isometimes enjoy having a good hot argument in 
Interactions class (.57) » 
(7 items) I enjoy looking for the flaws in an argument (.48) 
VI. Resentment of Poor Iresent being given assignments which I think are 4.3 58 
Teaching (3 items) purposeless (.57) 
inris done lectures, books, etc. really irritate me 
(54 


I get upset by teachers who will not seriously 
consider other points of view (.38) 


VII. Desire for Academic Iam enthusiastic about trying to get high scores or 4.1 63 
Success (4 items) grades (.49) 
If I have started on something, it is very important 
for me to complete it (.41) 
I try to do my very best on all my schoolwork, even 
on things I don’t find interesting (.45) 


E 
VIII. Desire for Career I need high scores, high grades, or a degree to help me 3.6 


Preparation (4 items) get the position I want (.42) 
Lam in school because I expect to learn things that 
will make me better at the job I want (or one I 
already have) (.40) 
School is a way of preparing myself so Ican 
accomplish significant things (.30) n 
Preparation for a career is not one of my main 
objectives in school (—.56) 52 
IX. Enjoyment of Passive I like to learn by hearing what others students think 33 
Interactions (5 items) (57) 
An important part of school is being around people I 
like (.50) Q 
It is very important for me to be able to feel friendly 
with my instructors (.30) 
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Oriented factor and to Morstain’s Achieve- 

ment factor. “Desire for Academic Success” 

is highly similar to parts of Chiu’s very broad 

Positive Orientation to Learning and Mor- 
-Stain's Achievement factors, each of which, 
as already noted, also bears relation to an- 
| other factor in the present study. 

At least three, and perhaps all five, of the 
remaining factors from the present study 
seem similar to dimensions identified by a 
‘single other investigator. “Desire for Self 
Improvement” is similar to Shack’s Self- 
‘explorative factor, “Desire for Esteem” to 
Chiu’s Need for Social Recognition and Re- 

action to Expectation factors, and “Enjoy- 
ment of Passive Interactions" to Riechmann 
E Grasha's Avoidant scale. 

"Enjoyment of Assertive Interactions" 
ears some resemblance to Riechmann and 
rasha's Competitive scale, although the 
tter seems to focus much more on “win- 
hing" as indexed by grades and other exter- 
al reinforcers than on enjoyment of the 
actual process of matching wits with others. 
The dimensions closest to “Resentment of 
Poor Teaching” are Riechmann and Gra- 
sha’s Avoidant scale and that portion of 
Alu’s Positive Orientation to Learning 
actor which has already been related to the 
ti-school factor. However, “Resentment 
oor Teaching” seems not to reflect a 
heral dissatisfaction or tendency away 
A school but rather the extent to which 
dents who otherwise may be enjoying 
ool are offended by what they consider 
or teaching. 

Veral prior studies have identified di- 
ons which were not found as factors in 
resent study. Some of these prior fac- 
Seem to have little bearing on academic 
vation. Morstain, for example, who 
tribes his factors as student “orienta- 
S rather than motivations, identified 
actors which involve extracurricular, 
Cal, and social orientations. But four 
ensions identified by prior studies 
est gaps in the present study's set of 
Shack’s Grade Dependent factor, 
n’s Assessment factor, and Riech- 
and Grasha’s Competitive scale all 
O converge on external evaluation of 
Dance, Shack's Constriction-resis- 
and Independent-informal Scholar 


th 
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factors, Morstain's Interaction and Inde- 
pendent Study factors, and Riechmann and 
Grasha's Independent scale seem to dictate 
a clear need for an autonomy dimension. 
The converse dimension, need for structure, 
is suggested by Shack's Pressure Reactive 
factor, Morstain’s Assignment Learning 
factor, and Reichmann and Grasha's De- 
pendent scale. Finally, Chiu's Motive to 
Avoid Failure points to another kind of 
motivation which affects some students who 
want to do well in school. 
Future research on the Academic Moti- 
vations Inventory should attend to these 
four dimensions. Since our inventory al- 
ready contains some items related to each of 
those factors, insertion of a few more such 
items should permit these factors to emerge. 
Additional research, including still more it- 
erations of rational and empirical metho- 
dologies, should also prove necessary as the 
project probes more deeply into motivational 
dynamics and as attempts are made to go 
beyond the possibly restrictive self-report 
basis of the Academic Motivations Inven- 
tory. Nevertheless, it seems that the project 
thus far, in combination with the work of 
other investigators, has provided a more 
complete structuring of the academic moti- 
vations domain than previously available 
and has produced an instrument with rea- 
sonable measurement potential. 
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An Interacti 
Charles J. Holahan 


Subjects were 129 freshmen living in 


affected residential satisfaction and 
friendships. 


A number of research studies concerned 
with quality of life in student residential 
environments have reported less living sat- 
isfaction and social cohesion in high-rise 
megadorms in contrast to low-rise dormitory 
settings. Findings have pointed to a lower 
level of students’ perception of social support 
and cohesiveness (Wilcox & Holahan, 1976) 
and less prosocial behavior and cooperation 
(Bickman et al., 1973) in high-rise as op- 
Posed to low-rise student housing. Crowd- 
ing in dormitory settings has been demon- 
‘trated to be related to increased stress along 
with decreased social contact (Valins & 
um, 1973), more negative ratings of living 
‘Pace (Eoyang, 1974), and more negative 
interpersonal attitudes (Baron, Mandel, 
ams, & Griffen, 1976). 
Although these findings from previous 
"search are of considerable practical im- 
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pon tions in preparing the manuscript and to 
vue uller and James Spearly for their assistance 
the data analysis. 
warts for reprints should be sent to Charles J. 
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Residential Satisfaction and Friendship Formation 
in High- and Low-Rise Student 


ousing: 
onal Analysis 


and Brian L. Wilcox 


University of Texas at Austin 


The purpose of this study was to investigate residential satisfaction and 
friendship formation in high- and low-rise student housing from an interac- 
tional perspective. The study involved (a) a comparison of residential satis- 
faction and friendship formation in high- and low-rise dormitories and (b) an 
analysis of the interaction between social competence and type of environ- 
ment in affecting both residential satisfaction and friendship formation. 


university student housing. Results 


showed that residents of low-rise dormitories were significantly more satisfied 
and established more dormitory-based friendships than residents of a mega- 
dorm setting. In addition, a number of interactions were found between so- 
cial competence and both type of environment and sex of the student, which 


the development of dormitory-based 


portance to university educators and ad- 
ministrators, a further question needs to be 
asked that is of both practical and theoretical 
significance to the educational psychologist. 
Are different types of students differentially 
affected by contrasting residential settings? 
This concern is of particular importance in 
the light of a growing body of investigation 
dealing with the issue of interactionism in 
personality research. A number of studies 
(Bem & Allen, 1974; Ekehammar, 1974; 
Endler & Hunt, 1968; Kjerulff & Wiggins, 
1976; Mischel, 1973; Moos, 1973) concerned 
with investigating personal versus situa- 
tional effects in predicting social behavior 
have found that the interactions between 
personal and environmental variables ac- 
count for an especially large proportion of 
the total behavioral variance. 

The purpose of the present study was to 
extend the investigation of residential sat- 


of students and types of residential settings. 
Specifically, the study involved (a) a com- 
parison of residential satisfaction and 
friendship formation in high- and low-rise 
dormitory settings and (b) an analysis of the 
interaction between students’ levels of social 
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competence and type of environment in af- 
fecting both residential satisfaction and 
friendship formation. It was predicted that 
(a) according to previous research in student 
residential environments, high-rise dormi- 
tories would be characterized by lower levels 
of satisfaction and friendship development 
than would low-rise dormitories and (b) 
based on earlier interactional studies, the 
interactions between type of environment 
and social competence would be of particular 
importance in predicting both satisfaction 
and friendship formation. 


Method 


Subjects 


Subjects in the study were 120 randomly selected 
second-semester freshmen who were residents of on- 
campus university housing at a state university. The 
sample included 55 males and 74 females. Only one 
subject was sampled from each room to avoid poten- 
tially correlated scores between roommates. Only 
freshmen were used in order to limit self-selection for 
specific living arrangements likely to occur for advanced 
students with prior living experience on campus. 
Freshmen on the campus studied were typically ran- 
domly assigned to the different dormitory settings; 
where such random assignment clearly did not exist, 
residents were excluded from the present analysis. 
Nevertheless, the sample cannot be assumed to be en- 
tirely random; in order to examine for any possible 
systematic population bias across dormitories, the 
subject samples from each dormitory were compared 
along a number of demographic indicators (family 
economic level, racial distribution, and geographic 
background) and the social competence measure. 
There were no significant differences between dormi- 
tories on any of these measures. 


Environmental Settings 


Two types of student residential environments were 
compared in the study, one high-rise megadorm and a 
number of low-rise dormitories. The megadorm con- 
sisted of a tower of 10 floors occupied by males and a 
tower of 13 floors occupied by females. The megadorm 
housed approximately 3,000 students. The low-rise 
dormitories consisted of two dormitories for males (one 
of two floors and one of four floors) and two dormitories 
for females (one of two floors and one of five floors). 
Each low-rise dormitory housed approximately 250 
students. These settings provided an ideal test of the 
effects of differential height in university residential 
housing because the low- and high-rise settings were 

equated for both amount of living space per student and 
for number of residents per floor. 
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Measures 


Social competence. Social competence was mea- 
sured by the Texas Social Behavior Inventory (TSBI) 
developed by Helmreich and Stapp (1974). The TSBI 
is a 16-item multiple-choice scale designed to assess 
individual self-perceptions of social competence and 
social self-esteem. Helmreich and Stapp (1974) report 
that factor analysis of the TSBI has yielded factors of 
confidence and dominance along with social compe- 
tence, and that the scale correlates highly (r = .50, p < 
.001) with the Self-esteem scale of the California Per- 
sonality Inventory. 

Sociometric Scale. Through discussions with dor- 
mitory staff, it was felt that for university freshmen new 
to the campus setting, the establishment of proximal 
and easily accessible friendship networks within the 
dormitory setting represented a particularly important 
component of positive adjustment to residential life. 
This assumption was reinforced through input from 
resident assistants that a chief complaint of dissatisfied 
residents in the megadorm concerned social isolation. 
Based on this information, we decided to develop a so- 
ciometric technique to measure the level of friendship 
networks established within the residential setting. We 
felt a valid measure should tap meaningful relationships 
in addition to casual social contacts. ‘Thus, a measure 
of dormitory-based friendships was developed, oriented 
toward measuring friendships at three levels of inti- 
macy: casual-recreational, personal-conversational, 
and supportive. The items that tapped each of these 
levels respectively asked the respondent to identify (a) 
a friend to join in a casual outing to a film or a sports 
event, (b) a friend to join you in a personal conversation 
where attitudes and values are shared, and (c) a friend 
to join you in discussing an intimate personal problem 
concerning your feelings about a member of your family. 
In responding to the scale, students were asked to 4 
choose up to five friends at each intimacy level who re- 
sided within their own dormitory. For quantification 
purposes, choices were weighted (from 5 for a first 
Choice to 1 for a fifth choice), and a final score was ob- 
tained by summing over the three friendship levels. 

Environmental satisfaction. A measure of satis- 
faction with the living environment was developed, 
which was composed of 10 scaled (7-point) items. The 
items measured satisfaction with meeting people and 
making friends, recreation, opportunities and places for 
personal conversation, finding help or support for a 
personal problem, comfort, privacy, student influence 
in policy decisions, physical layout of building and 
rooms, furnishings, and overall feelings about living in ^ 
the particular dormitory. The predictive validity of the 
Satisfaction Scale was determined by correlating the 
satisfaction score with students' decisions to remain in 
or leave the particular dormitory for the next academic 
year. A statistically significant correlation was ob- 
tained (r = .41, p < .01). 


Procedure 


The measures were administered in a survey that 
required approximately 15 minutes for completion. 
The surveys were handed out to each subject personally E 
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Mean Scores by Dormitory, Sex, and Competence on the Residential Satisfaction and 


Friendship Formation Measures 


"m Males- Females 
igh competent wcompetent High competent Low competent 
_ Measure M SD M SD M SD "SD 
E. : à Low-rise dormitories 

— Sm satisfaction 54.23 992 42.56 816 51.75 14.62 48.35 12,32 
| iendship formation 15.00 8.61 1211 7.64 20.90 8.85 21.00 7.65 
- r : Megadorm 

Residential satisfaction 39.34 1054 3800 800 36.79 13.33 45.75 10.76 

riendship formation 16.32 10.69 17.36 7.89 11.14 8.61 21.35 9.07 


" by the resident assistant on his or her floor. The subject 
was asked to complete the survey anonymously and to 
return it to the assistant’s mailbox within 2 days. The 
survey was conducted simultaneously across all settings 
1 month before the end of the spring semester. i 
experimental assistants working with the resident as- 
sistants assured a standardized administration proce- 
dure. There were 120 forms administered in the me- 
dorm and 60 to 100 forms in the male and female 
w-rise dorms, respectively. The return rate was 62% 


Results 


Table 1 shows mean responses by dormi- 
tory, sex, and social competence on the sat- 
isfaction and friendship measures. In the 
statistical analysis, social competence was 


Table 2 


| Results of Three-Factor Multivariate Analysis 


of Variance (Dormitory X Sex X Competence) 
with Residential Satisfaction and Friendship 
Formation as Dependent Variables 


Multivari- Univariate Fs 


ate F (df = 1,121) 
(df = 2, Satisfac- Friendship 
Source 120) tion formation 
Dormitory (A) o7i** — 19.47** 146 
Sex (B) 297*** po  592* 
qoe 04 231 
compete: C) 1.28 4 z 
Ax B" gai 3.80* oa. aao 
AxC 3.43* 49»  337"* 
BxC 4.14* 5.95%  405* 
AXBXC 37 36 23 
*p < 05. 
**p < 01. 


*'*p « 07. 


treated as a blocking variable, with high and 
low groups established through a median 
split. Table 2 summarizes the results of the 
2 X 2 X 2 multivariate analysis of variance 
(Dormitory X Sex X Social Competence), 
with residential satisfaction and friendship 
formation as dependent variables. For the 
multivariate Fs, significant effects were 
found for the dormitory main effect and for 
all three two-way interactions. 

For the univariate analyses, two signifi- 
cant main effects were found. Residents of 
the low-rise dormitories scored significantly 
higher on the measure of residential satis- 
faction than did residents of the megadorm. 
On the friendship measure, female residents 
scored significantly higher than did males. 
In addition, a number of interesting two-way 
interactions occurred in the univariate 
analyses. 

On the satsifaction measure, two interac- 
tions were significant: Dormitory X Social 
Competence and Sex X Social Competence. 
In the low-rise dormitories, high-competent 

students were more satisfied than low- 


competent ones; whereas in the megadorm, 
individuals were slightly 


low-competent c 
more satisfied than high-competent ones. 


For male residents, soci 
positively relai 
females, soci 
verse relation 
On the friendship me 

tions were significant (Do 

Sex X Social Competence) and another 
showed a statist 
(Dormitory X Socia! 


low-rise dormitories, t p 
markedly higher on the index of friendship 
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formation than did males; whereas in the 
megadorm, there was no relationship be- 
tween sex and friendship formation. For 
female residents, social competence showed 
a strong negative relationship to friendship 
formation; whereas for males, no relation was 
observed between social competence and 
dormitory-based friendships. Also, in the 
megadorm, low-competent subjects scored 
higher on the Friendship Scale than did 
high-competent ones; whereas in the low-rise 
dormitories, the reverse was true. 


Discussion 


These results support the negative picture 
of residential life in high-rise as opposed to 
low-rise dormitory settings reported in ear- 
lier research (Baron et al., 1976; Bickman et 
al., 1973; Valins & Baum, 1973; Wilcox & 
Holahan, 1976). Residents of the megadorm 
were significantly more dissatisfied than 
residents of the low-rise dormitories. The 
range of dissatisfaction in the megadorm was 
very broad, including feelings about social 
contact and support, features of the physical 
environment, and student involvement in 
policy decisions. 

While female residents established more 
dormitory-based friendships than did males, 
an especially interesting finding was the in- 
teraction between sex and type of dormitory 
in affecting friendships, reflecting differen- 
tial styles of responding to environmental 
settings as a function of sex. While present 
knowledge in this area is limited, Holahan 
and Holahan (1976) present data that relate 
to the present findings. In examining verbal 

descriptions of university dormitory envi- 
ronments, they found that females focused 
particularly on the reduced personalization 
of the dormitory setting, whereas males 
emphasized the increased social contact 
possible in a communal living environment. 
In the present study, the reduced personal- 
ization in the megadorm relative to the 
low-rise dormitories may have discouraged 
friendship formation on the part of females. 
Males, in contrast, may have been more in- 
clined to develop friendships in the mega- 
dorm than in the low-rise dormitories due to 
the relatively greater level of social contact 
that they permitted. 
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The interactions between social compe- _ 
tence and type of dormitory in affecting both | 
residential satisfaction and friendship for- 
mation were complex and somewhat sur- 
prising. The nature of the relationship be- 
tween these two measures of residential ad- 
justment may clarify the finding that in the 
megadorm, social competence was inversely 
related to both measures. In the highly 
dissatisfying megadorm environment, more 
socially competent residents may have 
elected not to establish dormitory-based 
friendships, thus indirectly facilitating the 
development of friendships among less so- 
cially competent students. These friend- 
ships in the megadorm may in turn have 
provided a foundation for the higher level of 
residential satisfaction among less compe- 
tent students. 

There are two sources of data that lend 
some support to this interpretation. The 
friendship formation measure was signifi- 
cantly related to residential satisfaction (r 
= .28, p < .01), and additional anecdotal 
data indicated a strong tendency for high 
socially competent residents in the mega- 
dorm to form more friendships outside of the 
dormitory setting than did less competent 
students. Of course, caution needs to be 
exercised in generalizing these findings to 
students in other university settings. The , 
subject sample we have examined here was 
selected both in terms of the characteristics 
of students who initially chose dormitory 
living at this university and in terms of those 
students who responded to the survey. In 
addition, an important direction for ex- 
tending the present study might involve the 
collection across settings of quasi-behavioral 
measures that reflect the effects of stress or 
reduced adjustment. Such measures might 
include grade point average, school dropouts, 
and visits to the university health center. —' 

The most important implication of the 
present findings for university educators is 
to underscore the importance of viewing 
student adjustment from an interactional 
perspective. This concern is congruent with 
increasing research evidence that adequate 
prediction of social behavior necessitates the 
measurement of both personality and situ- 
ational variables (cf. Bem & Allen, 1974; 
Ekehammar, 1974; Endler & Hunt, 1968; ) 


Kjerulff & Wiggins, 1976; Mischel, 1973; 
Moos, 1973). Clearly, no single type of ed- 
ucational environment will be ideally suited 
to the needs of all students. Adequate ed- 
ucational planning will necessitate a fuller 
appreciation of how different types of stu- 
dents are likely to respond to and attempt to 
cope with contrasting environmental set- 
tings. The interactional perspective appears 
to present an especially exciting area for 
further research of both a theoretical and 
practical nature on the part of educational 
sychologists. 
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Stephen G. Green 
Department of Management 
University of Cincinnati 


This experiment studied the impact of aptitude test scores and past perfor- 


mance information upon causal attrib 


performance. Graduate students (N 


utions about a student's poor present 
7 69) received either past performance 


information alone or past performance information and aptitude scores. The 


presence of low aptitude scores led to greater ability attributions regardless of 
past performance while not affecting effort attributions. The presence of high 


d 


aptitude scores led to greater effort attributions given poor past performance 
and a tendency to lesser effort attributions given successful past performance 
while not affecting ability attributions. Ability attributions also were found to 
be negatively related to expectancies about the Student's future success. The 
implication of these results for students and educators is discussed. 


Judgments about the causes of a stu- 
dent's performance, or causal attributions, 
are made daily. This process, however, is 
never more likely, nor more critical, than 
when a student is doing poorly. It is espe- 
cially at this time that both educator and 
student want to know the cause of the stu- 
dent’s performance in order to best deter- 
mine a remedy. Two readily available 
sources of information one might use in 
trying to determine a b explanation are 
past performance records and aptitude test 
scores. Given that educators may use apti- 
tude test scores in such a manner and that 
these scores have been shown to have ques- 
tionable validity as predictors of academic 
performance (Larson & Scontrino, in press; 
Marston, 1971; Thacker & Williams, 1974), 
it seems important to assess their impact 
upon educator's judgments about the causes 
of student performance. In particular, this 
article is concerned with examining (a) 
whether causal attributions about a poorly 
performing student's abilities and efforts are 
substantially different and (b) how they are 
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Thomas Taber for their editorial comments on an ear- 
lier draft of this article, 
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different, if those attributions are based o 
both aptitude test scores and past perfor- 
mance information as opposed to past per- 
formance information alone. 

Aptitude test scores, the Graduate Record 
Examination in this case, are touted to be 
measures of a student's ability to perform | 
academically, with low aptitude scores ob- 
viously denoting a lack of ability and high 
aptitude scores indicating substantial ability 
to perform as a student. Because these scores 
are apparently widely accepted as a valid | 
measure of a student's ability, it is expected | 
that in some situations their availability to 
persons judging a student's performance will | 
affect ability judgments. In the instance in ` 
which past performance and aptitude scores 
are consistent (e.g., success in the past and 
high scores), I felt that the presence or ab- 
sence of the consistent aptitude scores would 
not significantly affect ability attributions 
about present performance. When a person 
has performed consistently well in the past, 
there is already a tendency to explain that 
performance in terms of ability (Frieze & 
Weiner, 1971); a high aptitude score sho 
only agree with the original attribution 
thus not affect it. When past perform 
and aptitude scores are inconsistent (e.g; 
success in the past and low scores), (OWeVer, 
the configuration is more interesting and- 
informative because here we have the two 
sources of information competi Pc ible [ 


explanations of the student’s poor perfor- 
mance, thus providing a more stringent test 
their relative effects upon ability attri- 
tions. 

When a poorly performing student has 
ne well in the past, it is expected that the 
ence of a low aptitude test score will re- 
t in the student’s poor present perfor- 
ce being attributed more to ability than 
those aptitude scores were not available. 
e low aptitude score essentially indicates 
lack of ability and, therefore, suggests a 
usal explanation for the student's present 
ilure in spite of his past successes. Simi- 
failed in the past 
the presence of a 


igh aptitude test score should result in his 
being attributed 


core indicates a presence of ability and, 
erefore, discounts that factor as a causal 
xplanation for the student’s poor perfor- 


ance. 


that is, ei- 


in explaining that Yn 
t would be 


ther a lack of ability or low effor 


sufficient to caus! r 
Weiner, 1973, for a detailed explanation). 
Under these conditions, Kun and Weiner 
st that ability and effort are 
perceived as negatively covarying Caus 
to the extent that 


test scores does al- 


the past, the presence of low aptitude scores 
should result in the student’s poor present 
ing attributed less to effort 
res were not present. 
has failed in the 
past, the presence O scores 
tuld result in his poor p 
mance being attributed more to effort than 
“if those aptitude scores were not present. 
Finally, differences in ca! 
may have effects upo: 


tudent's educational One im- 


experience. 
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portant element which might be affected by 
these differences in attribution is expec- 
tancies about the student’s future perfor- 
mance. Past research has indicated that be- 
cause ability is seen as a stable attribution, 
ability attributions result in expectancies 
concerning future performance which are 
similar to the present performance; that is, 
the more ability is seen as the cause of fail- 
ure, the higher the expectancy that future 
performance will be a failure (Frieze & 

Weiner, 1971; McMahan, 1973; Valle & 

Frieze, 1976; Weiner, Nierenberg, & Gold- 

stein, 1976). Therefore, it is expected that 

the more ability is seen as the cause of the 

student’s poor present performance, the 

lower the subjective probability that the 

student will succeed in the future. 

Because effort has both stable and un- 
stable characteristics, however, its rela- 
tionship to future expectancies is more pro+ 
blematic. Past research indicates that ex- 
ccess following failure can be 


pectancies of su: 
either unassociated with or positively asso- 


ciated with effort attributions about that 
failure (McMahan, 1973; Valle & Frieze, 
1976). Which of these relationships will be 
obtained cannot be clearly predicted. Nev- 
ertheless, it does not seem likely that a strong 
negative relationship between effort attri- 
butions and future expectancies of success 
would be found as is expected in the case of 


ability attributions. 
Method 


Subjects 


tionnaires were returned, 
as the 


Procedure 

h questionnaire à hypothetical student, John, 
= Neca as having one of two levels of past per- 
ow. In the high past performance 
i ving done “very 


well” in high 
class), having 
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ble 1 im uh 
eae of Variance of Attributions to Ability and Effort and Probabilities of Future Success 
Ability Effort Probabilities 
Source df MS F df MS F df MS F 
Past performance (A) 1 3744 18.89** 1 944 5.04* 1 1,022.71 2.38 
i 2 10.58 5.34** ai ay 89 2 71.65 17 
Ti ga pu 2 1.90 96 2 787 4.20* 2 280.71 65 
Error 63 1.98 62 187 63 429.98 
* p «.05. 
** p € 01. 
at the 92nd percentile of his graduating class), and being Results 
accepted in 10 out of 10 graduate peers! to Max e 
applied. In the low past performance con: ition, John Es : i 
was described as having done “rather poorly” in high Ability Attributions 


school (GPA = 2.00 and bottom 40% of his senior class), 
having had a 2.47 undergraduate GPA (ranking at the 
47th percentile of his graduating class), and as being 
accepted at only 1 out of 10 graduate programs to which 
he applied. John was also described as having either 
high Graduate Record Examination scores (verbal = 
700; quantitative = 600), low GRE scores (verbal ='490; 
quantitative = 500) or no score information was given 
at all. 

It should be noted that while the low scores were 
considerably lower than the high scores, they were not 
low in any absolute sense and were well within the range 
of scores represented in a typical graduate program 
(Wilson, Note 1). Also, to provide a realistic and con- 
servative test of the impact of aptitude scores, the scores 
were in no way labeled as low or high or interpreted for 
the subjects. Finally, John's present performance was 
described in all conditions as poor with John being 
portrayed as not doing well in his first year in the 
graduate program, accumulating a GPA of 2.91 his first 
year, and being ranked by the faculty in the bottom 40% 
of his graduate class in terms of performance. 


Dependent Variables 


Each subject received information about one student, 
John, and rated the extent to which the level of John’s 
ability and effort were the cause of his Poor present 
performance on two independent 6-point scales (6 = 
very much a cause; 1 = not a cause). The subjects also 
assigned a probability (0%-100%) to John’s succeeding 
as a student and going ahead to complete his doctoral 
program. 


Analysis 


A 2 X 3 analysis of variance was performed on the 
ability and effort attributions and the probability of 
future success scores (see Table 1). The first factor was 
the student’s past performance (high/low); the second 
factor was the three aptitude test score conditions (no 
scores/low scores/high scores). In that the hypotheses 
involved comparisons between specific cells, planned 
contrasts were performed on the attributions to ability 
and effort and the expectancy scores. The results are 
discussed in terms of the planned contrasts. 


When the student had high past perfor- 
mance and low aptitude scores were avail- : 
able, his poor present performance was at- 
tributed more to ability, F(1, 63) = 6.01, p < 
.02, than when those aptitude scores were 
not available (see Table 2). Also, the pres- 
ence of high aptitude test scores did no 
significantly affect the attributions to ability, 
F(1, 63) = .28. These results fully support 
the hypotheses. When the student had low 
past performance and high aptitude scores — 
were available, his poor present performance 
tended to be attributed less to ability than 
when the aptitude scores were not present; | 
the difference, however, was not significant, 
F(1, 63) = 1.78. Additionally, the presence 
of low aptitude scores resulted in the stu- 
dent’s poor present performance being at- 
tributed even more to ability than when 
those aptitude scores were unavailable, F(1, 
63) = 453, p < -05, apparently reaffirming 
the belief that the student’s level of ability 
was the difficulty (see Table 2). 


Effort Attributions1 


ose scores. were not 
present, FE; 62) = 12. The availability of 
high aptitude scores with high past perfor« 
mance, however, unexpectedly resulted in $ 


* In analyses employing the effort rati * 
is not included because of missing qe one subject 
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Table 2 
OON Mean Attributions to Ability and Effort and Probabilities of Future Success 
High past performance Low past performance 
Low High Low High 
Causal No scores Scores lo scores "ees 
factors scores (inconsistent) (consistent) scores (consistent) (inconsistent) 
AP 
1.88 3.44 2.20 4.12 4.62 3. 
30 o Ae 1.61 99 96 12 
Effort 15 8 13 16 
5.25 4.33 4.13 3.12 3.25 4.38 
Sp s xe 1.68 113 1.36 1,36 
15 8 
Probability of future success m » 
M 40.00 47.78 43.67 36.88 31.54 39.38 
SD 26.19 22.79 25.81 13.87 17.72 15.69 
n 8 mg 15 8 1355 16 
1 Note. Larger numbers indicate more causality assigned or a higher probability. 
the student's poor present performance as  Expectancies 


being seen as somewhat less due to effort, 
F(1, 62) = 3.47, p € .07, than when those 
aptitude scores were not available (see Table 
2). This tendency is somewhat puzzling, but 
it may be that when the subjects were faced 


' with a student who had done well in the past 


and had high aptitude scores, they were 
unsure of the cause of the present poor per- 
formance or felt that with such an exemplary 
student that the cause must be some factor 
external to the student or a temporary, un- 
stable factor such as illness. When the stu- 
dent had low past performance and high 
aptitude test scores were present, as pre- 
dicted, his poor present performance was 
attributed more to effort, F(1, 62) — 4.45, p 
< .04, than when those aptitude scores were 
not present (see Table 2). Similarly sup- 
porting the hypothesis, low aptitude scores 
did not affect the attributions to effort, F(1, 
62) = 2.14. 


Relationship of Ability and Effort 
Attributions + 


Subtracting the cell mean from each 


3 subject’s ratings in order to remove treat- 


ment effects from the data, a correlation of 
—.92 (p < .04) was found between ability and 
effort attributions. This finding supports the 
expectation that these attributions can be 
used in a sufficient causal schema and thus 


. covary negatively. 


It also was proposed that changes in at- 
tributional explanations of a student's poor 
performance might be related to expec- 
tancies about that student's future perfor- 
mance. Although the analysis of variance of 
the future expectancies did not reveal any 
significant differences, a correlational 
analysis did provide support for the hy- 
potheses. As predicted, causal attributions 
to ability were negatively related to expec- 
tancies of improved future performance 
when the effect of effort attributions were 
partialed out (r = —.32, p < .01), whereas no 
such relationship was found between effort, 
attributions and future expectancies when 
the effects of ability attributions were par- 
tialed out (r = —.04). 


Discussion 


Clearly, the causal attributions about a 
poorly performing student's abilities are 
substantially different if those attributions 
are based on both aptitude test scores and 
past performance as opposed to past per- 
formance information alone. A student with 
poor past performance can expect the 
availability of low aptitude scores to serve to 
increase the probability that any present 
performance difficulties would be attributed 
to the level of his ability while not substan- 
tially affecting effort attributions. High ap- 
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titude scores in this situation would serve 
mainly to increase the likelihood of an effort 
attribution but essentially would not dispel 
the belief that he also has ability difficulties. 
Furthermore, it appears that a student with 
an excellent record of performance in the 
past and low aptitude test scores also runs 
the risk of having any present performance 
difficulties attributed more to ability than 
would occur if aptitude scores were not 
available. The presence of low aptitude 
scores, however, did not affect the effort at- 
tributions for the poorly performing student. 
Finally, high aptitude scores coupled with 
successful past performance apparently re- 
duce the tendency to explain student’s poor 
present performance in ability or effort 
terms. Perhaps attributions would shift 
toward an external factor such as task diffi- 
culty under these circumstances (Weiner et 
al., 1972) or to some other unstable and 
temporary factor, such as illness. 

Given the lack of any demonstrated rela- 
tionship between effort attributions and 
expectancies of future performance, the 
importance of the consequences of shifts in 
attributions to effort due to the presence of 
aptitude scores is not clearly interpretable. 
In the case of the demonstrated negative 
relationship between attributions to ability 
and expectancies of future success, however, 
the differences in attributions to ability due 
to the presence of aptitude test scores are óf 
greater concern, First, it is clear that apti- 
tude scores can lead to differences in the 
attribution of ability which are associated 
with differences in expectancies about a 
student's future performance. This fact 
opens the door to several varieties of bias in 
the student-teacher interaction, not the least 
of which is the Pygmalion effect (Rosenthal 
& Jacobson, 1968). Second, it appears that 
the availability of aptitude test scores po- 
tentially has a greater negative impact than 
a positive impact for the student, When the 

aptitude scores are low, any performance 
difficulties in the present clearly are attrib- 
uted more to ability than if those aptitude 
scores were not available, even if the student 
has a record of superior past performance 

Thus, the student risks the Possibility of 
lowered expectancies that he can improve in 
the future, creating the problems mentioned 


above. When the aptitude scores are high, 
however, they do not seem to have an 
equivalent positive effect for the student. 
The presence of high aptitude scores does 
not appear to be able to reliably reduce the 
salience of an ability explanation and thus 
allow the possibility of increased expec- 
tancies of future success. 

The generalizability of the present find- 
ings is limited in that (a) a single type of 
aptitude measure was used and (b) the 
subjects were not themselves educators. 
Graduate Record Examination scores, 
however, were probably as salient and as well 
understood by the graduate student subjects 
as these scores, or any other such aptitude 
test scores, are understood by most educa- 
tors. Clearly, other research is needed to 
determine whether the findings presented 
here do generalize to other settings. 


Reference Note 


1. Wilson, V. Profile of University of Washington 
psychology graduate students entering fall, 1974. 
noe manuscript, University of Washington, 
1974. 
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Worry, Emotionality, and Task-Generated Interference 


in Test Anxiety: An 
Empirical Test of Attentional Theory 


Jerry L. Deffenbacher 
Colorado State University 


This study investigated sources of interference in highly test anxious subjects 
performing under evaluative stress. Students from the upper and lower 30% 
of the Text Anxiety Scale distribution solved difficult anagrams under two 
evaluative conditions: high stress (evaluative) and low stress (nonevaluative). 
Major findings were that the high-anxiety-high-stress group (a) reported 
more anxiety during testing; (b) rated themselves, their abilities, and the task 
more negatively; (c) solved fewer anagrams; (d) estimated spending less time 
on task; (e) experienced more interference from anxiety; and (f) reported 
greater distraction of attention to heightened autonomic arousal (emotional- 
ity), worrisome thoughts (worry), and task-produced competing responses 
(task-generated interference) than did either the high-anxiety-low-stress or 
low-anxiety-high-stress group. Findings were interpreted in terms of atten- 


tional theories of anxiety. 


Studies of the relationship between test 
anxiety and performance have consistently 
demonstrated that high-test-anxious indi- 
viduals perform more poorly than low-test- 
anxious individuals in a variety of contexts, 
for example, on classroom tests (Alpert & 
Haber, 1960; Munz & Smouse, 1968; Paul & 
Eriksen, 1964), grade point averages (Alpert 
& Haber, 1960; Sarason, 1963), intelligence 
and aptitude tests (Alpert & Haber, 1960; 
Sarason & Mandler, 1952), and reading tests 
(Cotler, 1969; Kestenbaum & Weiner, 1970). 
The lower performance of the highly anx- 
ious, however, is not a simple artifact of 
ability, since the highly anxious are not de- 
monstrably less capable. Laboratory stud- 
ies (Sarason, 1961, 1972, 1973) have shown 
that the highly test anxious perform as well 
as or better than the less anxious when eva- 
luative stress is low. Similar results also 

have been found for naturally occurring 
tests. For example, high-test-anxious stu- 
dents perform more poorly on tests given in 


I wish to thank Larry Bloom and Eugene Oetting for 
their thoughtful comments and suggestions regarding 
earlier drafts of this manuscript. 

Requests for reprints should be sent to Jerry L. Def- 
fenbacher, Department of Psychology, Colorado State 
University, Fort Collins, Colorado 80523. 


the regular manner, but not on low-stress 
(Gaudry & Bradshaw, 1970; Paul & Eriksen, 
1964) or humorous (Smith, Ascough, Et- 
tinger, & Nelson, 1971) formats. 
Performance of the highly test anxious 
varies with evaluative stress. When evalu- 
ative stress is low, the high anxious perform 
as well as the low anxious; however, under 
high evaluative stress they perform at levels 
lower than either the low anxious or them- 
selves when stress is low. Evaluative stress 
appears to elicit behaviors which interfere 
with performance of the highly anxious. 
But what, specifically, are the sources of in- 
terference? What is it that the highly anx- 
ious do under high stress that the less anx- 
ious do not do or that even the highly anx- 
ious do not do when stress is reduced? 
Recent theory regarding test anxiety 
(Sarason, 1972; Wine, 1971) has interpreted 
the performance deterioration in terms of 
selective attention. According to this in- 
terpretation, highly anxious individuals 
under stress respond with personalized, 
self-oriented responses which direct atten- 
tion away from the task; therefore, perfor- 
mance suffers as a lower proportion of time 
is spent on the task itself. If attentional 
theory is valid, then as evaluative stress in- 
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creases, anxiety-related interference of the 
highly test anxious should increase and time 
on task and performance should decrease. 
The present research sought to test these 
predictions. 

Three classes of distractors or interfering 
behaviors were operationalized: worry, 
emotionality, and task-generated interfer- 
ence. Worry and emotionality were origi- 
nally described by Liebert and Morris 
(1967). Worry refers to cognitive concern 
about performance, consequences of failure, 
evaluation of one’s ability relative to others, 
and the like. Emotionality refers to self- 
perceived physiological arousal and upset, 
for example, heart racing and upset stomach. 
Research has shown worry to be inversely 
related to both performance (Deffenbacher, 
1977; Doctor & Altman, 1969; Morris & 
Liebert, 1970) and performance expectations 
(Doctor & Altman, 1969; Liebert & Morris, 
1967; Morris & Liebert, 1970). Though 
generally negatively related, emotionality 
has been less consistently and, in most cases, 
less strongly related to performance-related 
indices (Deffenbacher, 1977; Doctor & Alt- 
man, 1969; Morris & Liebert, 1970). The 
third source of interference, task-generated 
interference, was derived from drive theories 
of anxiety (Spence & Spence, 1966; Spiel- 
berger, 1966), which suggest that highly 
anxious people are more susceptible to 
task-produced competing responses under 
high drive conditions. Thus, within an at- 
tentional interpretation, the highly anxious 
may have attention directed away from the 
task to heightened autonomic arousal (emo- 
tionality), self-oriented cognitions (worry), 
and/or competing response tendencies gen- 
erated by the task (task-generated interfer- 
ence). J 

Employing a 2 X 2 factorial design (Low- 
High Subject Test Anxiety X Low-High 
Manipulated Stress), I predicted that 
subjects in the high-anxiety-high-stress 
group would (a) react more negatively to 
testing; (b) perform more poorly; (c) spend 
less time on task; (d) report greater inter- 
ference due to anxiety; and (e) report greater 
interference due to worry, emotionality, and 
task-generated interference than either the 
high-anxiety-low-stress or the low-anxi- 

' ety-high-stress groups. 
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Method 


Subjects 


The Text Anxiety Scale (Sarason, 1972) was admin- 
istered to 185 students in a sophomore level psychology 
of personality class. The upper (scores > 20) and lower 
(scores « 12) 3096 of the scale distribution operationally 
defined high and low test anxiety groups, respectively. 
Thirty-four (14 males and 20 females) high-anxious and 
34 (13 males and 21 females) low-anxious volunteers 
were randomly assigned, within constraints of main- 
taining sex ratio within groups, to either high- or low- 
stress conditions (n = 17 per group). All subjects re- 
ceived 5 bonus points toward their course grade. 


Instruments and Materials 


Test Anxiety Scale. This scale is a 37-item true-false 
instrument measuring self-reported test anxiety (see 
Sarason, 1972, for an extensive review of the scale). 

Experimental stress manipulation. Experimental 
stress was manipulated through a three-paragraph set 
of written instructions. The first two paragraphs were 
common to both stress conditions. ‘The first paragraph 
described the project as research on problem solving and 
stressed the confidential nature of the results. The 
second paragraph described the task as an anagrams 
task and provided an example of an anagram and how 
to solve it. The third paragraph varied with the stress 
condition. High-stress instructions were adapted from 
Sarason’s (1961, 1972, 1973) ego-involving instructions 
and stressed the intelligence-testing nature of the task, 
the low-difficulty level of the anagrams, the time-lim- 
ited nature of the task, and the importance of solving 
as many anagrams as possible to compare well with 
others. Low-stress instructions were modeled after 
Sarason's (1958, 1972) reassurance condition: (a) 
stressing the high difficulty of the anagrams and the 
likelihood of solving only a few anagrams and (b) con- 
taining suggestions not to worry, but to relax and take 
anagrams one at a time. 

Anagrams. Sarason's (1961, 1973) 13 high-difficulty 
anagrams were typed in capital letters and arranged in 
two columns on a single page. Below each anagram was 
a space for the subject's solution. 

Posttask questionnaire. The posttask questionnaire 
asked subjects to indicate the extent of certain thoughts, 
feelings, or behaviors during testing by circling a num- 
ber on arating scale. The first page contained ratings 
of (a) anxiety level during testing (0 = totally calm; 10 
= extremely anxious, panicked); (b) influence of anxiety 
(-1= interfered with performance; 0 = did not influ- 
ence performance; 1 = aided performance); (c) feelings 
about ability (0 — very negative; 10 — very positive); and 
(e) pleasantness of taking the test (0 = very unpleasant; 
10 = very pleasant). S 

Subsequent pages contained 26 questions asking 
subjects how often they performed certain behaviors or 
attended to various things while taking the test. Rat- 
ings were framed within the attentional theory of test 
anxiety on an 11-point scale with 0 representing 
“never,” “not at all,” “I did not think about it or do it 
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Table 1 : 
Intercorrelations of Dependent Variables & 
Measure 1 2 3 4 5 6 7 8 9 10 


1. Anxiety rating = —87 —49 '—40 —97 —40 -52  .69  .63 48 


2. Feelings—ability — 


. Feelings —self RON eH amis cep (cie 


3 
4. Task pleasantness 
5. Performance 

6. Estimated % time on task 

7. Anxiety interferences 

8. Emotionality 

9. Worry 
10. Task-generated interference 


Note.r = 30, p < 05; r = 29, p < 0l;r = -36, p < .001. 


— 66 50 


* The signs of all correlations with anxiety interference have been reversed to fit the logical correlation, rather than the confusing Y 


correlations introduced by the measurement procedures, 


during the test” and 10, “always,” “constantly,” “I 
thought about it or did it so much that I could not con- 
centrate.” 

A 9-item emotionality scale included items in which 
attention was directed toward cues of physiological 
arousal or upset, for example, upset stomach, perspiring, 
heart racing, Emotionality items were adapted from 
Liebert and Morris (1967) with additional items created 


evaluation. Some items in the worry scale were drawn 
from Liebert and Morris ( 1967), Mandler and Watson 
(1966), and Marlett and Watson (1968); others I con- 


A 5-item task-generated interference scale was de- 
veloped following the rationale of Spielberger (1966). 


Procedure 


5 Groups of 30-40 subjects met at the Same time of day 


posttask questionnaire and were debriefed, 


Results 


Table 1 presents the correlations between 
dependent variables for the Sample as a 
whole. Inspection of this table shows that 
all dependent measures were significantly 


intercorrelated and, therefore, should not be 
considered independent constructs or re- 
sponse classes. A degree of interrelation- 
ship, however, was not unexpected. Several 
of the relationships were predicted by at- 
tentional theory. Also, subjects were se- 
lected on level of test anxiety which research 


share method variance. 

Aware of the interrelationship of variables, 
data, however, were analyzed by analysis of 
variance (ANOVA) because I felt that at- 


for each Condition, are Presented in Table 2. 
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Means Grouped by Anxiety and Stress Level 
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Anxiety (A) by stress (S) grouping Results of analyses, F(1, 64) 
LowA- LowA- High A- High A- Anxiety Stress 
Measure LowS HighS LowS HighS effects effects AXS 
Anxiety rating 2.59 3.41 4.65 6.16 28.84**** 4.10** .33 
Feelings about ability 5.15 6.17 5.47 3.78 4.96** 1.02 2.96* 
Feelings about self 7.00 7.06 6.00 3.76 20.007*** — 5.15** 5.75** 
Pleasantness of task 6.06 6.35 5.59 3.76 6.30** 1.57 3.02* 
Performance 4.53 5.29 5.65 3.29 48 1.55 5.96** 
Estimated 96 time on task 88.8296 81.7696 71.6596 60.0096 17.68****  10,04*** 1.84 
Anxiety interference 29 —.35 —.24 —.82 11,.74*** 9.71*** — 29.69**** 
Emotionality .60 48 1.48 2.61 VIB2 TS 2.07 3.06* 
Worry 2.08 213 3.87 5.88 29.43**** 4.09** 3.68" 
Task-generated 
interference 3.71 3.22 4.88 6.47 17.79**** 1.11 3.90* 
*p «10. 
** p € .05. 
+p «01. 
eee < 001. 
Since the number of items in the interference felt worse about themselves than did other 


scales differed, each mean was corrected by 
dividing it by the number of items in that 
scale. This correction made sources of in- 
terference directly comparable. All be- 
tween-group comparisons were made by 
Duncan’s multiple-range tests. 


Reactions to Testing 


Differential reactions to testing were made 
by comparing ratings of (a) anxiety, (b) 
feelings about ability, (c) feelings about self, 
and (d) task pleasantness. For the anxiety 
ratings, main effects were found for both 
anxiety and stress. Pairwise comparisons 
revealed that the low-anxiety-low-stress 
group Was less anxious than either high- 
anxiety group (ps < .01) and that while the 
low-anxiety-hig d high-anxiety- 


h-stress ani 
. Jow-stress groups did not differ, both were 
less anxious than the 


high-anxiety-high- 
stress group (ps € .05). Analysis of feelings 
about one's abilities ratings revealed a main 
effect for anxiety level and an interaction 
which approached significance. The high- 
anxiety-high-stress group felt more nega- 
tively about their abilities than did other 
groups (ps < -05) which did not differ in 
their feelings about abilities. Similar, but 
stronger, effects were found for feelings 
about one’s self; significant effects were 
found for anxiety, stress, and the interaction. 
Again the high-anxiety-high-stress group 


while other groups did not 
differ. On ratings of task pleasantness à 
main effect for anxiety was found, and the 
Anxiety X Stress interaction approached 
significance. The high-anxiety-high-stress 
group found the task significantly more un- 
pleasant then other groups (ps € .05) which 
again did not differ from one another. "Thus, 
on all.measures high-anxiety-high-stress 
subjects reacted more negatively to and were 
more stressed by testing; they were more 
anxious, had lower perceptions of themselves 
and their abilities, and found the task more 
noxious than did other groups. Addition- 
ally, the high-anxiety-low-stress and low- 
anxiety-high-stress groups did not differ on 
any of these variables. 


groups (ps < 01), 


Performance 


Only the Anxiety X Stress interaction was 
significant for performance. Between-group 
comparisons showed that the high- 
anxiety-low-stress and the low-anxiety- 
high-stress groups solved more anagrams 
than the high-anxiety-high-stress group (ps 
< .05). Performance of the low-anxiety- 
low-stress group did not differ significantly 
from that of other groups. 


Perceived Time on Task 


Analysis of ratings of estimated time on 
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task revealed main effects for anxiety and 
While the low-anxiety—low-stress 
group did not differ significantly from the 


Interference 


Analysis of subject ratings of the influence 
of anxiety on performance yielded effects for 
anxiety, stress, and the interaction, Be- 
tween-group comparisons revealed that the 


on interference ratings, both reported less 
interference than the high-anxiety- high. 
Stress Broup (ps « .05). 

Emotionality, Worry, and task-generated 
interference were analyzed as possible con- 
tributors to 


than either the high-anxiety-low-stress (ps 
< .05) or the high-anxiety-high-stress (ps< 
high-anxiety-low-stress 
group also reported less interference due to 
Worry than the high-anxiety. high.strose 
condition (p « -01). 
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Significance. While the low-anxiety groups 
did not differ to the extent task parameters 
interfered with their performance, the low. 
anxiety-high-stress group reported signifi- 
cantly less task-generated interference than 
did either high-anxiety-low-stress (p < .05) 
or high-anxiety-high-stress (p < .01) groups; 
the low-anxiety-low-stress group reported 
less such interference than did the high- 
anxiety-high-stress group (p < .01). The 
igh-anxiety-low-stress subjects, too, ex- 
Perienced less task-generated interference 
than did the high-anxiety-high-stress group 
(p < .05). 
the high-anxiety-high. 
Stress group experienced significantly 
Breater interference from anxiety than other 
Emotionality, Worry, and task- 


tributory as significantly greater levels of 
each were found in the high-anxiety- high. 
Stress group. However, two-tailed tests for 
related measures showed that average worry 
and task-generated interference ratings were 
higher than average emotionality ratings, 
ts(16) = 5.53 and 4.99, ps < .001, Suggesting 
that worry and task-generated interference 
contributed more heavily than emotional- 
ity. 


Discussion 


While significant correlations Were noted 
among dependent measures, the results 


negatively to testing; (b) solved fewer anag- 


Tams; (c) spent less time on task; (d) experi- 
enc i 


and task-generated Interference than did 
h-anxiety—low-stress or low- 
Thus, for the 


Ipsi 
elements of the task irrelevant to efficient 
problem Solution, 


he low-stress Condition, however, did not 


aoe 
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elicit this dysfunctional pattern of attention. 
The high-anxiety-low-stress and low-anxi- 
ety-high-stress groups did not differ signif- 
icantly on performance; self-report data 
suggested that in most other ways the two 
groups were similar. They differed only on 
worry and task-generated interference, on 
which the high-anxiety-low-stress group 
reported more worry and task-generated 
interference. However, neither worry nor 
task-generated interference was sufficient 
to actively interfere with performance, 
suggesting that interference may have to 
reach some critical level or threshold before 
performance begins to deteriorate signifi- 
cantly. Thus, with the additional notion of 
a threshold of interference, attentional 
theory accounts for the comparable perfor- 
mance of the high-anxiety-low-stress and 
low-anxiety-high-stress groups. 

The results for the low-anxiety-low-stress 
group, however, were puzzling for attentional 
anxiety theory. The low-anxiety—low-stress 
group reported less interference from anxi- 
ety and greater estimated time on task than 
did other groups, which, from an attentional 
perspective, should have resulted in superior 
performance. In fact, the performance of 
the low-anxiety-low-stress group was not 
significantly different from other groups, 
including the high-anxiety-high-stress 
group. 

There are a number of ways to explain this 
finding. It may be that the low-anxiety- 
low-stress group experienced a source of in- 
terference not tapped by the scales em- 
ployed. If this were the case, the estimated 
time on task should have been lower. An- 
other more plausible alternative is a moti- 
vational interpretation. The low-stress in- 
structions simply may not elicit maximal 
motivation to perform. Since these in- 
structions tell subjects not to worry, low- 
anxious subjects may take the instructions 
literally and not be aroused enough to per- 
form to capacity. At the same time they 
might continue to focus on the task and not 
be distracted by anxiety-related interfer- 
ence. Thatis, high stress may serve a facil- 
itating (Alpert & Haber, 1960) or activating 
(Hebb, 1972) function for the less anxious, 
who without the increased arousal do not 
perform as well. This conjecture, however, 
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must await further research which clarifies 
the performance, motivational, and atten- 
tional patterns of low-anxious subjects in 
low-stress conditions. 

One finding of additional interest was that 
the high-anxiety-high-stress group reported 
significantly more worry and task-generated 
interference than emotionality. While 
task-generated interference was a concept 
introduced in this study, previous research 
has shown worry to be more important than 
emotionality (Deffenbacher, 1977; Doctor & 
Altman, 1969; Morris & Liebert, 1970). 
Thus, evaluative stress appears to elicit a 
tendency for the highly anxious to become 
preoccupied with worrisome cognitions and 
task irrelevancies and only secondarily, with 
heightened physiological arousal and 
upset. 

While these latter results are essentially 
correlational, they offer some suggestion for 
anxiety-reduction interventions. Since 
worry and task-generated interference were 
the greatest sources of interference, pro- 
grams containing cognitive restructuring of 
worrisome thoughts and training in task- 
oriented self-instruction hold the greatest 
promise (e.g., Hahnloser, 1974; Holroyd, 
1976). Interventions which add self-man- 
aged relaxation (e.g., Hahnloser, 1974; 
Meichenbaum, 1972) may improve effec- 
tiveness as skills for the reduction of 
heightened emotionality are added. 
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A Picture Is Not Always Worth a Thousand Words: 
Pictures as Distractors in Reading 


Dale M. Willows 
University of Waterloo, Canada 


Two related studies were conducted to examine the influence of pictures in the 
periphery on children's speed and accuracy of reading. In both studies, chil- 
dren were required to read sets of words under each of three conditions: with 
no pictures, with related pictures, and with unrelated pictures. The two stud- 
ies differed in the age of the children (second versus third graders) and in the 
location of the pictures (behind versus above the words). In both studies, the 


results consistently showed the following: 
slowly whenever pictures were present. 
more interference than related pictures. 


(a) The words were read more 
(b) Unrelated pictures produced 
(c) The magnitude of both of these 


effects was inversely related to reading ability. 


Reading has been taught for generations 
using primers containing pictures (Huey, 
1908/1968). There has been a recent trend 
to give the illustrations in children's readers 
even greater prominence than they have had 
in the past. The books used for teaching 
reading in the early grades of school often 
allot more space to the artwork than to the 
printed text. In some cases, pictures even 
constitute the background on which the 
words are printed. Despite their ever-in- 
creasing salience, the role of pictures in 
children's beginning readers has been little 
researched (Gibson & Levin, 1975; Samuels, 
1970). 

Opinions among educators and authors of 
children's beginning readers about the value 


* ofillustrations range from those who believe 


4 


that pictures serve an essential function in 
the instructional process to those who be- 
lieve that pictures serve no useful purpose 


This research was undertaken with the assistance of 
a Grant-in-Aid of Educational Research from the On- 


and H i € 
deepest thanks for their cooperation and kindness. 


and that they may interfere with children’s 
learning to read (Aukerman, 1971; Chall, 
1967). Other than on the testimonials of 
practitioners, however, there is very little 
basis on which to judge whether the illus- 
trations in their books are beneficial or det- 
rimental to children’s reading. 

Samuels (1967) has demonstrated that the 
practice of systematically pairing new words 
with identifying pictures interferes with 
children’s learning to recognize those unfa- 
miliar words in isolation. An equally im- 
portant question concerning the use of il- 
lustrations in children’s beginning readers 
is whether the presence of pictures on the 
page affects children’s performance when 
they are reading words which they can al- 
ready decode. The studies reported here 
were concerned with this latter issue. 

The possibility that the presence of 
printed words in peripheral vision might 
interfere with reading performance was ex- 
amined in a study of selective reading (Wil- 
lows, 1974). In that research Willows dem- 
onstrated that there is an inverse relation 
between reading ability and susceptibility to 
visual distraction. Poor readers made more 
errors and read more slowly when words 
were introduced between the lines of the 
passage they were required to read. Good 
readers’ performance was not impaired by 
the presence of verbal information in pe- 
ripheral vision. On the basis of those find- 
ings, it might be expected that pictures in 
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peripheral vision would interfere with 
reading performance and that the degree of 
susceptibility to distraction by pictures 
might be related to proficiency in reading. 

Willows also showed in the research on 
selective reading that the extent to which an 
individual processes the meaning of verbal 
information in peripheral vision is directly 
related to reading ability. In contrast to 
poor readers’ performance, good readers’ 
memory for the content of a passage can be 
influenced by placing semantically relevant 
words in peripheral vision. These findings 
might lead one to expect that the degree of 
relatedness between printed words and 
pictures in peripheral vision would also af- 
fect the reading performances of good and 
poor readers differentially. 


Experiment 1 


The goals of the first experiment were (a) 
to examine the influence of background 
pictures on the speed and accuracy of chil- 
dren’s reading, (b) to determine whether the 
semantic association of a background picture 
to the word printed on it affects reading 
performance, and (c) to relate susceptibility 
to distraction by background pictures to 
reading ability. 


Method 


Subjects, Parental consent forms were distributed 
in three second-grade classrooms in a middle-class el- 
ementary school in Waterloo County, Canada. After 
attrition, the final sample included 32 children (16 boys 
and 16 girls) ranging in age from 6 years 8 months to 8 
years 7 months. 

Standardized test scores. The Gates-MacGinitie 
Reading Tests (Primary B, Form 1) were administered 
according to standard procedures to each of the three 
second-grade classrooms (Gates & MacGinitie, 1965), 
Both the vocabulary and comprehension subtests were 
included. Raw scores were converted to standard scores 
using the tables provided with the test. A Pearson 
correlation of each child’s vocabulary and comprehen- 
sion scores indicated that performance on the two dif- 
ferent subtests shared a considerable amount of com- 
mon variance, r(30) = .82, p « .001. "Therefore, a 

composite reading score was calculated by averaging 
each child's vocabulary and comprehension standard 
Scores. 

Test materials. There were three sets of test mate- 
rials. In a control set, 75 different first- and second- 
grade nouns were printed 15 per page on 5 pages. All 


of the words were printed in lowercase primary typein ` thi 
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three rows of 5 words each on 842 X 11 in. (22 X 28 cm) 
pages. Ina related-picture condition, the same five sets 
of 15 words were used but the order of the words on each 
page was randomized and each of the words was su- 
perimposed on a picture which was an associate of it. 
The word cat, for example, was printed on a simple line 
drawing of a dog. In an unrelated-picture condition, 
the same five sets of words and pictures as in the re- 
lated-picture condition were used, but the words and 
pictures on each page were rearranged so that each word 
was superimposed on a picture which was not an obvious 
associate of it. The word cat, for example, was printed 
on a picture of a lemon. 

Procedure. A female experimenter tested each child 
individually in an empty classroom in the school. The 
15 pages of test materials (5 sheets from each of the 
three conditions) were administered to all of the 32 
children, with each subject serving as his own control. 
"The presentation order of the 15 sheets was randomized. 
The individual testing sessions each lasted for ap- 
proximately 20 min. 

"The session began with a practice period during which 
each child was taught through modeling and instruction 
to read aloud, quickly and accurately, the 15 words on 
two specially prepared practice pages. He was then told 
that sometimes the experimenter would try to trick him 
by putting some pictures on the page, but that he should 
not pay any attention to them and should simply read 
the words aloud quickly and without error as he had 
already done. The child was then given two additional 
pages of practice with words superimposed on pictures. 
None of the words or pictures used in the practice ses- 
sion were the same as those used in the actual test ma- 
terials. 

It was clear that all of the children understood that 
the pictures were there to distract them and that they 
should try to ignore them. As soon as the practice trials 
were completed, the test materials were introduced 
without a break in the procedures, On each trial, 1 of 
the 15 pages was placed face down on the table in front 
of the child and he was told to getready. The page was 
then turned face up and he was told to begin. 

A stopwatch was used to measure the time (to the 
nearest second) that it took each child to read the 15 
words on a page. All sessions were tape recorded; 
reading errors were scored later from the tapes. An 
error was scored when a child said a nonword, when he 
pote rains than the one on the page, or when 

told a word after 10 sec had elapsed. 


Results 
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Table 1 
Means and Standard Deviations of Task Time 


and Errors for Experiment 1 
Picture-word relationship 


No Related Unrelated 
Variable pictures — pictures pictures 

Time (sec) 

M 86.59 103.81 120.25 

SD 53.76 61.82 74.64 
Errors 

M 6.53 5.56 7.28 

SD 6.03 5.38 6.62 


Pot SD solic 7s aie nd eae ATEA 


under the three treatment conditions. 
Table 1 shows the means and standard de- 
viations for both the time and error mea- 
sures. 

Reading time. The ANOVA on reading 
time showed that there was a main effect of 
reading condition, F(2, 60) = 41.88, p < 01. 
Tukey’s test for honestly significant differ- 
ences (HSD) as described by Myers (1972) 
was used to compare the mean reading times 
under the three reading conditions (control, 
related picture, unrelated picture). The 
results indicated that compared with their 
control levels, children read more slowly 
when either related or unrelated pictures 
were present (p < 01) and that unrelated 
pictures produced more interference than 
did related pictures (p < 01). 

‘A further ANOVA on reading time was 
performed to determine whether the effect 
of the three reading conditions was the same 
for each of the five sets of words. There was 
a significant main effect for word set, F(4, 
124) = 25.42, p < 01, indicating that some 
of the pages of words were more difficult 
than others. An examination of the means, 
however, showed that the relationship be- 
tween each of the three reading conditions 
was almost identical for the five sets of 
words. Hence, the results were essentially 
replicated with five different sets of pictures 


To examine the relation of reading ability 
Pearson corre- 


of interference caused by related and unre- 
lated pictures. These two interference 
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scores were calculated by subtracting each 
child’s reading time under the control con- 
dition from his reading times under each of 
the two picture conditions. The Pearson 
correlations showed that for these stimulus 
materials, there was a significant inverse 
association between reading ability and in- 
terference by related pictures, r(30) = —.43, 
p < 01, and between reading ability and 
interference by unrelated pictures, r(30) = 
—.59, p < .01. 

To determine the extent to which reading 
ability was associated with interference from 
the meaning of the pictures, difference scores 
were calculated between the reading times 
in the unrelated- and related-picture con- 
ditions. A Pearson correlation of these 
difference scores and reading ability indi- 
cated that the greater interference from 
unrelated pictures compared with related 
pictures was more marked for poorer read- 
ers, r(30) = —.51, p <.01. That is, the less 
skilled readers were more influenced in their 
reading performance by the semantic rela- 
tion between the word and the background 
picture than were the more proficient read- 
ers. 

Reading errors. The ANOVA on reading 
errors showed that there was a main effect of 
reading condition, F(2, 60) = 7.82, p < 01. 
Multiple comparisons of the mean numbers 
of errors using Tukey’s HSD test indicated 
that although neither of the picture condi- 
tions differed from control reading, there 
were more reading errors under the unre- 
lated- than under the related-picture con- 
dition (p < .01). Compared with control 
performance, related pictures improved ac- 
curacy of decoding, while unrelated pictures 
interfered with it. 

An ANOVA was performed to assess the 
reliability of these results on reading errors 
across the five different sets. This analysis 
demonstrated, as the time measure had, that 
the word sets differed in difficulty, F(4, 124) 
= 10.03, p <.01, and also that there was an 
interaction between reading condition and 
word set, F (8, 248) = 2.27, p <.05. An ex- 
amination of the means showed, however, 
that there were consistently more reading 
errors in the unrelated- than in the related- 
picture condition in four out of the five word 
sets and that in the fifth set there was es- 
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sentially no difference. Furthermore, in 
four out of the five sets of words, the related 

improved accuracy of decoding 
relative to control performance. 

The correlations of reading ability and 
interference from pictures, as reflected by 
errors in decoding, were generally low. The 
only significant Pearson correlation coeffi- 
cient was between reading ability and sen- 
sitivity to the semantic relation between a 
word and the picture on which it was printed. 
‘The less skilled readers made more errors 
while readíng words printed on unrelated 
than on related pictures, r(30) = 739, p < 
05. The better readers Sapa little affected 
in their reading accuracy by 
relation between the words and the pictures 
on which they were printed. 

Of the reading errors made, only a very 
small number, a total of 5 in the related- and 
2 in the unrelated-picture conditions, were 
overt intrusions from the background pic- 
tures, These eg 9 were consistent with 
those of Rosinski, Golinkoff, and Kukish 
(1975), who, with a somewhat similar task, 

found no overt intrusions from pictures with 
second-grade, sixth-grade, 
subjects, 


Discussion 


Background pictures clearly Affected 
ing speed and — —Ó in thís study on 
n com 


that less skilled readers were more suscep- 
tible than better readers to the distracting 
effects of background pictures, 

The overall detrimental effect of back- 
ground pictures on reading speed in Exper- 
iment 1 could well have been a to 
some obvious physical difference (such as 
legibility) between the control and back- 
ground-picture conditions. Thus, although 
the findings strongly suggested that books 


and adult grade 
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which use illustrations as the background for 
the text are affecting many children's read- 
ing performance adversely, they were not 
generalizable to the more usual situation in 
which the pictures are placed above or sur- 
rounding the text rather than as a back- 
ground for it. 


Experiment 2 


Experiment 2 was undertaken in an at- 
tempt (a) to replicate Experiment 1 using 
third-grade children as subjects, (b) to de- 
termine whether interference by pictures 
would be reduced when the words were 
printed below the pictures rather than di- 
rectly on top of them, and (c) to determine 
the extent to which the reading-ability ef- 
fects in nt 1 might be accounted for 
by more general intellectual factors, 


Method 


Subjects. Letters requesting parental permission to 
participate in “a study of factors that influence reading” 


Waterloo County, Canada. 
groups were constituted as follows: (a) the pictures- 
behind condition included 34 children, 19 girls and 15 


boys, in age from 7 11 
9 ram f d years 11 months ire 


condition 
82 children, 16 girls and 16 
Ms ran ane 


ed test scores, Since the scores from 
ized tests are not typically 
two forms each 


task. The 
standardized testa were not scored 5 
rela o£ had been completed, "The tarii 
us 71 Q4 e Sores on the two forme ofthe at 
Gatos Maik? € 41) and on the two forms as AT 
64, p < 01). The higher score each child ahoo di 

and 

reading TQ wat selected as the best estimate of 
reading 
(df = 64, p < 01), 
of the i 


guson, 1971). 
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of a pictures-above condition in which 
Tid words were printed 14 in. (1.27 em) below rather 
than on top of each picture. Overall, the design 
between variable 


Results 
"The results of a preliminary raat in- 


either dependent measure (F <1). There- 
fore, sex of subject was not included in any 
further analyses. 

on the between variable (picture location) 
was also found on both time and errors in 
this initial analysis, but a comparison of the 
means using Tukey's HSD test (Myers, 
1972) indicated that there were significant 
differences on both time (p < 01) and errors 
(p <.01) under the no-picture control con- 
dition, which was identical for the two pic- 

Since the subjects 


t were com! " 
reading comprehension, and IQ scores for 


Means and Standard Deviations on Age, 
Reading, and IQ in Experiment 2 LES 
Picture location 
Variable (n= 32) (ne) — ! 
Reading standard 
Cone) 
M 59 se i0 
v Culture Fair) > a 
ae 1154 100.5 207 
sD 96 147 
months) d 
x : 103.3 1014 
44 50 
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"Table 3 

Corrected Means and Standard Deviations for 

Picture-Location G Experiment 2 
Picture- Picture-word rt 

_ group pictues pitune _ pictures 


Pictures above (n = 32) 


Time 
M 68.39 16.21 79.30 
SD 26.46 30.53 36.06 
Errors 
433 AM A71 
SD 3.58 418 AM 
Pictures behind (n = 34) 
M 70.06 83.85 68.92 
8D 53.55 57.98 4.04 
Errors 
M 3.89 3.48 4M 
8D 4.88 5.09 5.75 


"The analysis of covariance ind 
time 


onding ability was a sein errors 
, FU, 63) * 61,65, p € 01, and errors, 


FA), 63) = 84.09, p € 01, and 
no overall effect of picture location 
measure after the ‘effect of reading ability 


Picture-word interference. As in Ex- 

t 1, a purpose of Experiment 2 was 

to examine the amount of interference to 
reading formance caused by pictures. 
Since the interest in examining interference 
was a priori, ¢ tests for correlated samples 
(Ferguson, 1971) were used to compare 
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reading speed and accuracy when related or 
unrelated pictures were present with the 
no-picture control performance. In the 
pictures-behind condition, the results were 
very similar to those obtained in Experiment 
1. There was significant interference on 
reading time in the related-pictures condi- 
tion, t(33) = 4.91, p < .01, two-tailed test, 
and in the unrelated-pictures condition, 
t(33) = 4.19, p < .01, two-tailed test. The 
difference in the amount of interference 
caused by unrelated and related pictures, 
although in the same direction as in Exper- 
iment 1, was not significant, t(33) = 1.43. 

The pattern of results for the group that 
had the pictures above the words rather than 
behind them was very similar on the read- 
ing-time measure. Both related pictures 
and unrelated pictures caused significant 
interference, t(31) = 3.85, p < .01, two-tailed 
test, and ¢(31) = 3.61, p < .01, two-tailed 
test, respectively. The difference between 
the interference caused by unrelated and 
related pictures was not significant under 
this condition either, t(31) = 1.23. 

On the reading-errors measure, the only 
significant result was that in the pictures- 
behind group there were more errors in the 
unrelated- than in the related-picture con- 
dition, ¢(33) = 2.84, p < .01, two-tailed test. 
This result was similar to Experiment 1’s 
finding that relative to the no-picture control 
condition, unrelated pictures tended to in- 
crease errors and related pictures tended to 
reduce them. 

As in Experiment 1, Pearson correlations 
between reading standard scores and pic- 
ture-word interference scores were calcu- 
lated. In addition, partial correlations were 
calculated partialing out the variance ac- 
counted for by nonverbal intelligence, The 
zero-order correlations and the partial cor- 
relations are shown in Table 4. The zero- 
order correlations with third graders were 
lower than those in Experiment 1 with sec- 
ond graders, but there were still significant 
relationships between interference from 
both related and unrelated pictures and 
reading ability. Less skilled readers read 
more slowly and made more errors when 
there were pictures in peripheral vision than 
under control conditions. Furthermore, the 
unrelated pictures had a greater interfering 
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Table 4 " M 
Correlations Between Reading Ability and 
Interference Scores (Partial Correlations 


Controlling for IQ are in Parentheses) 


Variable Time (in sec) Errors 
Related pictures- 
no pictures —.29* (—.16) —.30* (—.27)* 
Unrelated pictures— 
no pictures —.42> (—.30)*  —.365(—.26) 
Unrelated pictures— 
related pictures —.31> (—.26)® ^ —.08 (.01) 


a p <.05, two-tailed test. 
b p € .01, two-tailed test. 


effect than did the related pictures for the 
poorer readers. 

'The partial correlations indicated that 
although nonverbal intelligence partially 
accounted for the above effects, there was 
also a significant amount of the variance 
attributable to reading ability. 


General Discussion 


The results of Experiment 2 confirmed 
and extended those of Experiment 1. 
"Third-grade children, like the second grad- 
ers, read more slowly in the presence of pic- 
tures than when no pictures were available 
in the periphery. As would be expected, the 
task was easier for the third graders. An 
examination of the mean reading times and 
errors across the two experiments showed 
that, compared with third graders, second 
graders were slower and less accurate on the 
task in the control (no-pictures) condition. 
Furthermore, the second graders suffered 
more interference in the presence of. pictures. 
The correlations between picture-word in- 
terference and reading ability were similar 
to those for second graders, but they were 
lower gd the third graders. 

‘ne location of the pictures was not a 
peal determinant of their interfering ef- 
MR Although the means showed that the 
Perd of interference by pictures was re- 
boki ne the pictures were moved from 
2 above the words, this difference 
‘as not Statistically Significant. 
uei eun ence (as measured on a 
reading abit ) did account for part of the 
intelli, AP Mity effect. Children of lower 

ce were more reliant on the pictures 


b 


P 


than were brighter children. Even when the 
effects of intelligence were partialed out, 
however, there remained a significant 
amount of the variance which was due to 
reading ability. 

Taken together, then, the pattern of re- 
sults of both experiments is clear and con- 
sistent: Pictures in the periphery do affect 
children’s speed and accuracy of reading; the 
size of the interfering effect of the pictures 
depends on their relevance to the words 
printed near them; younger, less skilled 
readers are more susceptible to these influ- 
ences. 

Some parallels do exist between the 
present results and the findings in other 
studies. There have been a number of 
studies of picture-word interference in the 
literature, but most of them have been de- 
signed to assess the processing of words in a 
picture-naming situation. The previously 
mentioned study by Rosinski et al. (1975) 
did, however, involve word-naming (reading) 
conditions in which pictures were the po- 
tential distractors. Although there were 
important differences between their para- 
digm and the present one which make a di- 
rect comparison of the results impossible, the 
findings of Rosinski et al. (1975) did appear 
to show a decrease with age in the effect of 
interfering stimuli parallel to the one found 
here, but over a much broader age span 
(second grade, sixth grade, and adults). 
Furthermore, the present finding of a greater 
susceptibility to distraction by pictures 
among the less skilled readers parallels the 
results of Samuels (1967) in his studies on 
the influence of identifying illustrations on 
the acquisition of new words. Samuels’s 
(1967) “principle of least effort” interpre- 
tation may also explain why pictures had an 
interfering effect here and why poor readers 
were more vulnerable to that effect. 

The finding here that unrelated pictures 
produced more interference than did related 
pictures was one of the more important and 
interesting results of these studies. A very 
reasonable interpretation of these data is 


ildren either consciously or au- 
that the chil. ; re 
use the pictures as clues to the meanings of 
the words printed on or near them, as many 
methods of beginning reading teach them to 
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do. (This practice is obviously inefficient if, 
as the control condition showed, the child is 
already capable of decoding the word in the 
absence of pictures.) If they did consult the 
pictures for clues to word meanings, then the 
information provided by the unrelated pic- 
tures would have been misleading, while the 
information provided by the related pictures 
would have directed them in most cases to 
the appropriate semantic category. In 
processing the unrelated and related pic- 
ture-word combinations lemon-cat and 
dog-cat, for example, if the child said the 
name of the picture to himself (or automat- 
ically processed the meaning of the picture) 
before (or at the same time as) attempting to 
decode the word printed on it, he would be 
expected to have easier access to the mean- 
ing of the to-be-decoded word (cat) if he had 
just processed the meaning of the related 
picture (dog) rather than of the unrelated 
picture (lemon). 

In most books used for reading instruc- 
tion, the illustrations which accompany a 
story are complex and include representa- 
tions of many components of the text. Ifa 
child comes to a word he already knows, then 
the pictures in the periphery are superfluous 
and probably distracting. If he does not 
know a word and looks to the picture for a 
clue to its meaning, he may well be misled by 
those aspects of the picture which are not 
closely related to the meaning of the partic- 
ular word he is trying to decode. : 

Although, in the present study involving 
speeded word recognition, pictures did im- 
pede children’s decoding, it is quite con- 
ceivable that pictures would have facilitative 
effects on other types of reading perfor- 
mance. At this point, there is a clear need 
for further research investigating the con- 
ditions under which pictures contribute to 
or detract from children’s attempts to rec- 
ognize words and to comprehend text. 
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The social environments of 19 high school classes were related to student ab- 
senteeism rates and to the average final grades given by the teacher. Classes 
with high absenteeism rates were seen as high in competition and teacher con- 
trol and low in teacher support. Classes in which teachers gave higher average 


grades were seen 


as high in involvement and low in teacher control. 'The re- 


sults are discussed in light of their implications for understanding the differ- 
ential effects of classes, as well as for identifying and changing high-risk class- 


room environments. 


Several recent studies have focused on 
assessing the characteristics and impacts of 
the social environments of classrooms 
(Nielsen & Kirk, 1974; Randhawa & Fu, 

.1973). Different assessment procedures 
have been used to describe the frequency of 
problems observed in elementary school 
classes (Barclay, 1974), to discriminate be- 
tween regular and experimental physics 
classes (Welch & Walberg, 1972) and be- 
tween rural and urban classes (Randhawa & 
Michayluk, 1975), and to characterize classes 
for regular and for gifted students (Steele, 
House, & Kerins, 1971). The underlying 
assumption of this line of research is that 
environments exercise an important influ- 
ence over their members. Classroom envi- 


- ronments are thought to have certain de- 


mand characteristics which influence stu- 
dent growth and development. 

For example, Walberg (1969) related the 
Learning Environment Inventory subscales 
to cognitive criteria such as understanding 
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science and knowledge of physics and to 
noncognitive criteria such as interests in 
physics. Students in classes seen as more 
difficult and competitive gained more on 
physics achievement and science under- 
standing, whereas those in classes seen as 
more satisfying gained more on reported 
science interest and activities. 

Trickett and Moos (1974) used the 
Classroom Environment Scale (CES) to link 
student satisfaction and mood to the social 
environment of high school classrooms. 
Students expressed greater satisfaction in 
classrooms characterized by high student 
involvement and affiliation, by innovative 
teaching methods, and by clarity of rules 
regarding classroom behavior. Classrooms 
in which students reported a great deal of 
content learning combined an effective 
concern with students as people with an 
emphasis on students working hard for ac- 
ademic rewards (competition) within an 
organized context. These results suggest 
that classrooms must be intellectually chal- 
lenging to encourage growth in achievement 
and understanding as well as cohesive and 
satisfying to encourage student interest and 
motivation. 

Several investigators have recently em- 
phasized the importance of focusing on ed- 
ucational outcomes other than those as- 
sessed by traditional achievement tests, for 
example, cognitive preferences (Tamir, 
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1975), satisfaction with school (Epstein & 
McPartland, 1976), and continuing interest 
and motivation to learn (Maehr, 1976). The 
purpose of this study was to focus on two 
such variables which we thought are related 
to the classroom environment, that is, the 
student absenteeism rate and the average 
class grade. These variables were chosen 
because of their practical importance, be- 
cause they are relatively objective (or at least 
readily ascertainable) measures of actual 
behavior, and because some evidence indi- 
cated that they were probably related to the 
learning environment. For example, having 
a pupil-oriented teacher may contribute 
significantly to improved attendance for 
black students (St. John, 1971). In addition, 
either of the variables may mediate other 
types of classroom outcomes, such as 
achievement, satisfaction, dropping out, and 
the like (Epstein & McPartland, 1976). 
The absenteeism rate is a particularly 
important intermediate outcome variable, 
since students are less likely to be affected 
by classrooms they attend less frequently. If 
Students are absent they cannot avail 
themselves of relevant learning opportuni- 
ties and lose the continuity of course content 
which is crucial for learning (Morgan, 1975; 
Karweit, Note 1). Students who attend 
classes less regularly earn lower grades 
(Kooker, 1976; Rozelle, 1968) and may show 
less-than-expected learning gains (Jenne, 
1973). Skipping school (truancy) has been 
related to self-reported delinquency 
(Walberg, 1972); the absence rates of high 
school dropouts may be elevated for several 
years prior to their dropping out (Yudin, 
Ring, Nowakiwska, & Heinemann, 1973). 
Student absenteeism is partly a function 
of physical symptoms and medical illnesses. 
There is considerable evidence that charac- 
teristics of social environments are related 
to these sets of variables (e.g., Kiritz & Moos, 
1974). For example, student living groups 
characterized by high student complaints of 
physical symptoms were perceived as com- 
petitive and as low in involvement, support, 
and student influence (Moos & Van Dort, in 
press). Military basic training companies 
with high sick call rates emphasized strict 
organization and officer control and de- 
emphasized the enlisted man’s personal 
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status (Moos, 1974). Related research has 
shown that absenteeism in work settings is 
high when there is inadequate or poor com- 
munication among employees and between 
employees and supervisors and when work- 
ers have little autonomy, are not allowed to 
make major decisions about their work, and 
have few opportunities to learn new skills 
(Indik, 1965; Jenkins, 1973). -These con- 
siderations led to the prediction that com- 
petition and teacher control are positively 
related, whereas involvement, affiliation, 
and teacher support are negatively related 
to absenteeism rates. 

'The grading policy of a teacher is also 
important in mediating the cognitive and 
noncognitive effects of classroom environ- 
ments. Students who get good grades are 
more likely to be satisfied with school and 
may show more continuing motivation and 
interest in the course content (Epstein & 
McPartland, 1976; Maehr, 1976). There are 
also relationships between the characteris- 
tics of classroom environments and the av- 
erage grades expected by the students in a 
class (Moos, in press). Since students 
probably have realistic ideas regarding 
teachers' grading policiés, it seemed likely 
that average class grades would be related to 
the classroom environment. We hypothe- 
sized that involvement, affiliation, and 
support are positively related and competi- 
tion and teacher control are negatively re- 
lated to average class grades. 


Method 


: Nineteen classes were sampled from one high school 
in which students were almost exclusively in a college 
preparatory curriculum. Only one high school was used 
because student absenteeism rates vary among schools 
and our focus was on the correlates of. classroom climate. 
Sampling from various schools might have confounded 
classroom and school effects, 

The 19 classes constituted a representative sample 
of the classrooms in the high school. The sampling 
attempted to maximize the number of different stu- 
dents tested; in general, there were very few students 
who were in more than 1 of the 19 classes. The subject 
matter Tepresented included math and algebra, foreign 
languages, biology, English, art, and business book- 
keeping. Theclassrooms were all about the same size. 
Careful student absenteeism records were kept for 
students in each class. Students’ final grades were 
obtained at the end of the semester. 


Xy 


"I 


dis 


CLASSROOM SOCIAL CLIMATE 


Assessment of Classroom Social Climate 


Information concerning the dimensions on which 
classroom social environments differed was obtained 
from the CES, which was given to the students in each 
class in the middle of the semester. The rationale used 
in developing the CES was that the consensus of indi- 
viduals characterizing an environment. constitutes a 
measure of the social climate of that environment. The 
CES assesses junior high and high school classes as 
perceived by the students and/or teachers. The scale 
consists of 90 items which fall onto nine subscales, each 
of which measures the emphasis on one dimension of 
classroom climate. 

The Involvement, Affiliation, and Teacher Support 
subscales are conceptualized as relationship dimensions 
and assess the extent to which students and teachers 
support and help each other and the degree to which 
they are involved in the class and its activities. The 
second group of subscales are personal growth or goal 
orientation dimensions. They measure the focus on 
specific goals of classroom environments, for example, 
Task Orientation and Competition. The last four 
subscales of Order and Organization, Rule Clarity, 
Teacher Control, and Innovation assess system main- 
tenance and system change dimensions. These sub- 
scales tap information about the structure and organi- 
zation of the class as well as about the processes and 
potential for change in its functioning. 

Further details about the development and correlates 


ir the CES are given in Trickett and Moos (1973) and 


Moos and Trickett (1974). In brief, the CES subscales 
have internal consistencies ranging from .67 to .86, 6- 
week test-retest reliabilities ranging from .72 to .90, and 
average intercorrelations of around .25, indicating that 
they measure distinct albeit somewhat related aspects 
of classroom environments (see Hearn & Moos, in press; 
Kaye, Trickett, & Quinlan, 1977; Moos, in press; 
Trickett & Moos, 1974, for further information about 
the CES). 


Results 


Each of the 19 classrooms was character- 
ized by 9 student and 9 teacher CES subscale 
scores. These 18 scores were correlated with 
the median number of absences per student 
and the mean grades of the students in each 
class. The median absenteeism rate for the 
19 classes was 8.7 absences per student per 
semester (range from 3.9 to 16.0). The mean 
grade for the 19 classes was 4.0 on a 5-point 
scale (SD = .55; range from 2.4 to 4.8). The 
rank order correlation between absenteeism 
and grades was — 45, indicating a tendency 
for student absences to be higher in classes 
with more stringent grading practices. 

There were substantial relationships be- 
tween student and teacher perceptions of the 
classroom environment and mean class 
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Table 1 
Correlations of Classroom Environment Scale 
(CES) Scores and Absenteeism and Grades 


Student Teacher 
perceptions perceptions 


CES Mdn no. Mean Mdn no. Mean 
subscale absences grades absences grades 
Involvement —.33 45*  —3l 6120 
Affiliation =.15 .45* —321 Bl 
Teacher Support —.22 45% —.54* .936 
Task Orientation 36  -10 EU 01 
Competition 51* -36  -12 07 
Order and 
Organization 46 -27 3-03 07 
Rule Clarity Al -—52* -02  —26 
Teacher Control .b6* -—85** .397 —.45* 
Innovation —.30 24 18 ll 
* p € 05. 
** p « 0l. 


grades (see Table 1). For student percep- 
tions, all three of the relationship dimensions 
were significantly positively correlated with 
mean grades. Rule Clarity and Teacher 
Control were significantly negatively corre- 
lated with mean grades. For teacher per- 
ceptions, Involvement was significantly 
positively and Teacher Control, significantly 
negatively correlated with mean grades. 
Thus, both students and teachers perceived 
classrooms with higher average final grades 
to be higher in Involvement and lower in 
Teacher Control. The direction of the re- 
lationship for the other three subscales in 
which student perceptions were significantly 
related to mean grades (Affiliation, Teacher 
Support, Rule Clarity) was the same for 
teachers and students; however, the corre- 
lations for teacher perceptions were not 
statistically significant. 

able 1 also shows that the student ab- 
senteeism rate was significantly positively 
correlated with student perceptions of 
Competition and Teacher Control and sig- 
nificantly negatively correlated with teacher 
perceptions of Teacher Support. Thus, 
classrooms which students perceive as high 
in Competition and Teacher Control (and, 
to a somewhat lesser extent, Rule Clarity) 
and which teachers perceive as low in 
Teacher Support tended to have higher rates 
of student absenteeism. 
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Additional analyses were conducted for 
nine of the classrooms in which records were 
kept separately for medically excused and 
other absences. The median per student 
absenteeism rate for these nine classes 
ranged from 6.0 to 14.0; the median per stu- 
dent rate of medically excused absences 
varied from 3.6 to 7.5. The correlations 
between the classroom social environment 
and the rate of medically excused absences 
were generally similar to those for total ab- 
sences. Specifically, the three significant 
correlations shown in Table 1 were repli- 
cated (student-perceived Competition and 
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correlated with student-perceived involve- 
ment (r = —.81) and Teacher Control (r = 
-74). The rank-order correlation between 
the rate of medically excused and other ab- 
sences was .61. Thus, although these results 
must be viewed as preliminary due to the 
small sample of classes (N = 9), the overall 
relationships seem to hold for both medically 
excused and other types of absences (e.g., 
cutting class). 

Since absenteeism is a particularly im- 
portant intermediate outcome, a further 
analysis was conducted to clarify the specific 
aspects of the classroom milieu most closely 


a 
Teacher Control, r = .59 and .74, respec- related to the absenteeism rate. Each of the 
tively; teacher-perceived Teacher Support, CES items is answered either true or false. 
r = —.54). In addition, the medically ex- The proportion of students answering true 
cused absenteeism rate was significantly was calculated for each item for each of the 
Table 2 
Classroom Environment Scale (CES) Items Significantly Related to Student Absenteeism 
Scoring 
direction Correlation 
to relate with y 
CES to student student Won 
subscale absenteeism absenteeism 
Involvement True :52* — Students are often clock watching in this class. 
Involvement False —58'* Students really enjoy this class. 
"T'each in thi 
pes n True -65** Students have to watch what they say in this class, W 
Task Orien- False —63** We often spend more time discussing outside activities 
tation than class-related material, 
Competition False =.64** Students usually pass even if they don't do much. 
Rule Clarity True 48* There is a clear set of rules for students to follow. 
Rule Clarity True -58** The teacher makes a point of sticking to the rules T 
he's made. 
Rule Clarity True 46* There are set ways of working on things. 
Teacher True 48* Ifastud in thi i 
Control p in PA ded. a rule in this class he's sure to 
Teach Fi -67** 1 5 
"d alse :67** The teacher is not very strict, M 
"Teacher True 65** Students get in i 
à i t , 5 F 
Control the cla E is "ise t Ed re not in their seats when 
‘Teacher True 60°" It i i ; y 
Cóntrol Herd to get in trouble here than in a lot of other 
Teacher False — 48" Th i 7 
Control € teacher will put up with a good deal. 
Innovation False —.45° What students do i i 
days. do in class is very different on different 
*p<.05. 
** p « 01. 
4 
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19 classrooms. Each of the items was then 
correlated with the classroom absenteeism 
rate. These correlations were between the 
median per student absenteeism rate and the 
proportion of students who answered the 
item in the true direction. 

The 14 CES items that were significantly 
(p < .05) related to the student absenteeism 
rate are shown in Table 2. Students in 
classes with high absenteeism rates are more 
likely to feel they are often clock watching, 
that they need to be careful about what they 
say, that there are clear and set rules, and 
that it is relatively easy to get into trouble in 
the class. These students are also more 
likely to state that they do not enjoy the 
class, that they cannot discuss outside ac- 
tivities in class, that passing the class is rel- 
atively difficult, and that the teacher is fairly 
strict. 


Discussion 


The average class grades and the student 
absenteeism rate are related to classroom 
social climate. Students and teachers per- 
ceived classrooms in which teachers gave 
higher average grades as high in Involvement 
and low in Teacher Control. Classes with 
high absenteeism rates were seen as high in 
Competition and Teacher Control and low 
in Teacher Support. No causal implications 
can be drawn from our data; in fact, the class 
climate, student absenteeism, and teacher 
grading practices are probably mutually in- 
terrelated in a complex manner. Teachers 
may establish their authority early in the 
development of a class and students then 
quickly learn the implicit and explicit rules 
governing classroom life. Teachers in these 
classes can be more supportive, since they 
have relatively little need to justify their 
authority or to criticize students for their 
behavior (Kaye et al., 1977). Students in 
this type of class are more satisfied, have 
higher morale, and expect higher grades. 
Students are also more likely to earn—and 
teachers to give—higher grades. Con- 
versely, teachers may initially describe their 
grading policies, these policies may affect the 
development of the classroom environment, 
which may, in turn, affect student motiva- 


tion, absenteeism, achievement, and final 
grades. 
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These relationships may also be mediated 
by student background characteristics and 
the subject matter of the class. For example, 
students in one class may have a higher av- 
erage ability level, be easier to get along with, 
easier to control, or more highly motivated 
and involved than students in another class. 
Students who are conceptually more mature 
and/or who have higher prior achievement 
levels in the course content tend to be more 
satisfied in classrooms high in innovation 
and low in teacher control (Hunt, 1975; To- 
bias, 1976). These students may facilitate 
the development of highly innovative class- 
room environments in which they are more 
productive and earn higher grades. Al- 
though we found no differences in either 
grades or absenteeism rates among classes of 
different subject matter, it is possible that 
classroom climate, grades, and absenteeism 
rates may be affected by variables such as 
course content, grade level, academic versus 
vocational orientation, and the like. These 
considerations indicate that our results must 
be viewed as preliminary and that the rela- 
tive importance of student background and 
school and classroom setting characteristics 
in affecting absenteeism and grades remains 
to be determined. 

The findings raise some new consider- 


. ations relevant to understanding the effects 


of classroom environments. Both Trickett 
and Moos’s (1974) and Walberg’s (1969) re- 
sults suggest that environments must be 
intellectually challenging to encourage 
growth in achievement and understanding. 
Students may learn more in classrooms that 
emphasize competition and difficulty, but 
they are apparently also absent more often 
from these classrooms. Since absenteeism 
is related to poorer grades and/or later 
dropout, an emphasis on competition may 
encourage cognitive growth among some 
students at great personal costs to others. 
This is consistent with Wessman’s (1972) 
finding of increased anxiety and tension in 
some of the disadvantaged high school stu- 
dents who participated in a compensatory 
education project and with the notion that 
some students experience more failure—and 
therefore perform more poorly and are less 
self-assured and less motivated—in com- 
petitive than in cooperative or individualistic 
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class settings (Johnson & Johnson, 1974). 

The generally similar results of studies 
conducted in quite different social settings 
(i.e., high school classrooms, university stu- 
dent living groups, and military basic train- 
ing companies) should be noted. High-risk 
settings are seen as low in involvement and 
support, as high in competition and task 
orientation, and as high in restrictive control 
(or low in student influence). In this con- 
nection, there is evidence that the nature, 
strength, and availability of social supports 
are protective factors which buffer or cush- 
ion a person from the effects of various psy- 
chosocial life stress factors (Cobb, 1976). 
Thus, an environment high in competition 
and support is likely to have a quite different 
impact from one high in competition but low 
in support (Moos, in press). This suggests 
that social systems intervention should be 
focused on increasing social supports, espe- 
cially in competitive settings. 

Although our work needs replication, the 
prior identification of high-risk classroom 
settings has important practical implica- 
tions. It is possible to assess a social envi- 
ronment, to provide feedback to the partic- 
ipants about the characteristics of their mi- 
lieu, to engage and motivate them to change 
the social milieu in directions they them- 
selves desire, and to monitor and evaluate 
the results of the change process (Moos, 
1974; Tuckman, McCall, & Hyman, 1969). 
Data from the CES can be used to formulate 
planned change within the classroom and to 
evaluate the change in relation to student 
and teacher perceptions of the social envi- 
ronment and changes in student behavior, 
including a reduction in the absenteeism rate 
(DeYoung, 1977). Thus, for example, 
classroom settings which are identified as 
high risk early in the school year could be 
candidates for further environmental diag- 
nosis and preventive counseling. This 
change-oriented social systems approach 
may constitute a useful set of procedures to 
complement those usually provided by 
school counseling and guidance Services, 


Reference Note 


1. Karweit, N. Rainy days and Mondays: An i 
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Teachers’ Sensitivity to the Reliability of Information in 
Making Causal Attributions in an Achievement Situation 


Hilda Borko and Richard J. Shavelson 
University of California, Los Angeles 


This study examined the effects of reliability and valence of information on 
attributions in an achievement situation. There were 164 subjects (teachers 
and nonteachers) who read 1 of 16 different scenarios (a 2 X 4 X 4 design) de- 


scribing a fictitious student. The information in the scenarios varied in terms 


of reliability and valence. Subjects then rated the extent to which the stu- 
dent’s academic success was due to each of four causal factors: ability, effort, 
exam difficulty, and luck. Attributions to ability and effort were greater for 
positive information; attributions to luck were greater for negative informa- 
tion. Attributions to ability were also influenced by reliability. These results 
partially followed Kelley’s discounting and augmentation principles. 


In decision-making models of teaching 
(e.g., Shavelson, 1976; Shulman & Elstein, 
1975), teachers’ estimates of students’ apt- 
itudes! are expected to influence their 
pedagogical decisions. These estimates 
represent an integration of information 
about students, such as test scores, anecdotal 
reports from other teachers, and personal 
observations. Attribution theory (Heider, 
1958; Kelley, 1973; Weiner, 1974, 1976) offers 
a possible model of how teachers arrive at 
their estimates, since it deals with the pro- 
cesses by which people integrate information 
to arrive at causal explanations for events. 
The influence of informational cues about 
students on teachers’ causal attributions is 
in broad terms the focus of this study. 


Attribution Theory and Educational 
Situations 


Weiner (1974, 1976) has identified four 
factors that are the most salient perceived 
causes of success and failure in achievement 
situations: ability, effort, task difficulty, 
and luck (cf. Frieze, 1976). Causal judg- 
ments are in part based upon information 
about these factors, which is readily available 
in educational situations. These four causal 


The authors wish to express their thanks to Nancy 
Russo and Bernard Weiner for their valuable critical 
review of the manuscript, to Joel Cadwell for his assis- 
tance with the’data analysis, and to Tonia Izu for her 
assistance with the data collection. 

Requests for reprints should be sent to Richard J. 
Shavelson, Graduate School of "Education, University 
of California, Los Angeles, California 90024. 


factors have been represented on two pri- 
mary causal dimensions: locus (internal vs. 
external) and stability (stable vs. unstable). 
Ability and effort are factors internal to the 
person, whereas task difficulty and luck are 
external or environmental factors. Fur- 
thermore, ability and task difficulty are 
relatively stable or invariant over time, while 
luck and effort are less stable. 

In addition to specific informational cues, 
the type of cognitive structure that is used to 
derive causal judgments may affect per- 
ceived determinants of success and failure. 
One such cognitive structure that influences 
causal judgments is labeled a causal schema. 
Causal schemata are ways of thinking about 
the relationship between an observed event 
(an effect) and the perceived causes of that 
event. They provide the person with a 
means of making causal attributions given 
limited information (Kelley, 1978). 

Kelley (1973) has suggested that several 
causal schemata come into play when the 
person attempting to explain a particular 
event (the “attributor”) has information 
from only a single observation. Two of these 
causal schemata are particularly relevant to 
achievement-related situations. When ef- 
fects or outcomes are moderate, a multiple 
sufficient causal schema may be used. In 


1“Apititude, pragmatically, includes whatever pro- 
motes the pupil’s survival in a particular educational 
environment, and it may have as much to do with the 
types of thought and personality variables as with the 
ability covered in conventional tests” (Cronbach, p. 
24). 
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this case, each possible cause in and of itself 
is seen as potentially sufficient to produce 
the effect. Kelley’s discounting principle 
describes the pattern of causal attributions 
that arises when a multiple sufficient causal 
schema is used. According to this principle, 
when specific information about some 
plausible cause of a given effect is present, a 
person will discount the role of other possible 
causes in producing the effect. For example, 
if a person's ability is perceived to be an ad- 
equate explanation for an observed 
achievement outcome, other possible causes 
(e.g., task difficulty and luck) will be dis- 
counted (Kun & Weiner, 1973). 

Another causal schema that may also be 

used in achievement situations is a com- 
pensatory schema. This schema refers to 
the belief that certain causal factors if suf- 
ficiently strong can overcome or compensate 
for the effects of other causal factors. The 
pattern of causal attributions that arises 
when a compensatory causal schema is used 
is described by Kelley’s augmentation 
principle. According to this principle, when 
inhibitory factors (i.e., factors that suppress 
an observed effect) are present, attributions 
to other plausible causes will become 
stronger. Thus, if a child who receives high 
grades does not try hard in school (i.e., the 
child's lack of effort is an inhibitory factor 
with respect to his or her academic achieve- 
ment), the classroom teacher may infer that 
learning activities are easy enough to com- 
pensate for the child's lack of effort. In this 
case, the role of task difficulty in producing 
the child's high grades is augmented. 

This study investigates whether teachers’ 
explanations of a student’s achievement 
outcomes follow predictions based on the 
discounting and augmentation principles. A 
confirmation of these predictions would 
suggest that teachers use multiple sufficient 
and compensatory causal schemata to ex- 
plain student achievement. Specifically, 
subjects were given either positive infor- 
mation, negative information, or a combi- 
nation of positive and negative information 
about a hypothetical fifth-grade student’s 
achievement-related behaviors and abilities 
and information about that student’s success 
in an academic situation. The attributional 
bias noted by Frieze and Weiner (1971) 
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suggests that teachers will tend to attribute 
a student’s success to internal or personal 
factors. It was therefore predicted that 
when positive information about the student 
was provided (i.e., when two plausible pre- 
ferred causes—effort and ability—were 
present), teachers would employ a multiple 
sufficient causal schema and attribute the 
student’s success to ability and effort while 
discounting the factors of task difficulty and 
luck. It was further predicted that when 
negative information was provided (i.e., 
when lack of ability and lack of effort were 
presented as inhibitory factors), teachers 
would employ a compensatory causal sche- 
ma and attribute increased importance to 
task difficulty and luck and decreased im- 
portance to ability and effort. Finally, since 
evidence suggests that teachers place greater 
emphasis on positive than negative infor- 
mation (cf. Weiner & Peter, 1973), it was 
predicted that when given conflicting in- 
formation, subjects would attribute the 
student's success to internal factors while 
discounting external factors. 


Reliability of Informational Cues 


Studies conducted within the perspective 
of attribution theory are based on the im- 
plicit assumption that information available 
to teachers is reliable. Yet, in reality, much 
of the information available to teachers (e.g., 
anecdotal records from previous teachers 
and classmates' comments) is unreliable. If 
teachers are sensitive to the reliability of 
information, they should ignore unreliable 
information and base causal explanations 
and pedagogical decisions on a model of the 
average student. 

"There is some evidence that teachers do 
not differentiate between reliable and un- 
reliable information. For example, Dusek 
(1975) and Smith and Luginbuhl (1976) have 
shown that in laboratory studies, teachers' 
interactions with students are influenced by 
unreliable information about the students. 

However, other studies, in which teachers 
were given both reliable and unreliable in- 
formation, suggest that teachers may attend 
more to reliable than unreliable information. 
Yoshida and Meyers (1975) found that 

teachers' estimates of a child's future per- 
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formance were more strongly influenced by 
their own observations than by unreliable 
prior information. Shavelson, Cadwell, and 
Izu (1977) gave subjects information about 
a hypothetical student at two points in time. 
The information was either reliable or un- 
reliable and either positive or negative. 
After each presentation of information, 
subjects made estimates of the student’s 
future academic performance and made 

' several pedagogical decisions. In making 
and revising their estimates and instruc- 
tional planning decisions, subjects tended to 
ignore unreliable information. 

A second purpose of this study, then, was 
to determine whether teachers’ causal ex- 
planations for student behavior are sensitive 
to the reliability of informational cues. 
Thus, information about the hypothetical 
student’s achievement-related behaviors and 
abilities was varied with respect to reliability 
(reliable vs. unreliable) as well as valence 
(positive vs. negative). It was predicted that 
teachers would tend to ignore the unreliable 
information. When given reliable infor- 
mation, it was expected that the discounting 
and augmentation principles would come 
into play, resulting in the attributional pat- 
terns described above. However, when in- 
formation was unreliable, it was expected 
that teachers’ attributions to ability, effort, 
task difficulty, and luck would be unaffected 
by the valence of the information. 


Method 


This study examined the effects of reliability and 
valence of information about a student on teachers’ and 
nonteachers' attributions for that student's academic 
success, Subjects read descriptions of a hypothetical 
student, which varied as to the reliability of the infor- 
mation (reliable or unreliable information) and the 
valence (positive or negative information). They then 
rated the extent to which the student’s academic success 
was due to each of the four causal factors: ability, ef- 
fort, difficulty of the exams, and luck. 


Subjects 


Subjects were 164 graduate students in education at 
a large California university; 112 subjects were female 
and 52 were male. These subjects were 119 teachers 
and 45 nonteachers. Each subject was enrolled in one 
of six sections of a graduate research course in the De- 
partment of Education. Of the 164 subjects, 9 had 
missing data on one or more of the attributional judg- 
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ments. Missing data were unrelated to treatment 
condition. Therefore, data from these 9 subjects were 
not included in the analyses. 


Treatment Variables 


Sixteen different scenarios were constructed to de- 
scribe a fictitious fifth-grade student. Each scenario 
contained two sets of information (initial and addi- 
tional), which varied in terms of reliability (reliable or 
unreliable) and valence (positive or negative), The 
initial information about the student was typical of 
information available to a teacher at the beginning of 
the school year. The additional information described 
the same student about halfway through the school year. 
‘The 16 scenarios represented all possible combinations 
of reliability and valence of initial and additional in- 
formation. Each scenario began with the following base 
story: 


Michael is 10 years old and beginning the fifth 
grade. He lives with his parents, an older brother, 
and two younger sisters. 


Reliability and valence of initial information were 
varied along three dimensions: father's occupation, 
Michael's use of time, and Michael's intelligence. The 
reliable positive and negative inserts were the following 
(negative inserts appear in parentheses): 


In an interview with his parents, his father gave his. 
occupation as an engineer (a machinist) in an aero- 
dynamics firm. In the interview, his parents also 
noted that Michael spent about 2 hours each eve- 
ning on his homework and reading books (never did 
any homework but spent 2 hours each evening 
watching television). On an individual intelligence 
test, Michael scored quite high (low). 


The unreliable positive and negative inserts were the 
following: 


In an interview, a classmate stated that while he did 
not know Michael well, he thought Michael's father 
helped to design airplanes (worked on airplanes). 
He also thought Michael enjoyed doing his home- 
work (never did any homework), spent a lot of time 
on extra credit reports (watching television), and 
was very smart (not very smart). 


Reliability and valence of additional information 
were also varied on three dimensions: academic ability, 
curiosity, and attitude toward school. ‘The reliable 
positive and negative inserts were the following: 


At mid-semester, Michael was tested in math and 
reading. ‘The results showed that he was perform- 
ing at about seventh- (third-) grade level, approxi- 
mately 2 years ahead of (behind) expectations for 
his age. The school psychologist reported that Mi- 
chael's curiosity enhanced his ability to do well in 
his math and reading and that he had an enthusias- 
tic and positive attitude toward school (Michael 
had difficulty in directing his curiosity to school ac- 
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tivities, often becoming distracted and losing inter- 
est in class discussions, and that he had a negative 
attitude toward school). 


The unreliable positive and negative inserts were the 
following: 


When interviewed, some of Michael’s classmates 
said that they liked him (did not particularly like 
him) and that they thought he was a good student 
(was not a very good student). Cathy Robbins, an 
education student at a nearby college, had been 
hired as a substitute aide at Michael’s school. She 
had assisted in Michael's class for a few days and 
had decided to administer an inkblot test to the 
class. She interpreted the results to mean that Mi- 
chael was curious and that he had a positive atti- 
tude toward school (that Michael's curiosity led 
him to be easily distracted from academic activities 
and that he had a negative attitude toward school). 


Instrumentation and Dependent 
Variables 


"The dependent variables were subjects' ratings of the 
importance of four causal factors in determining Mi- 
chael's successful academic performance, indicated by 
As and Bs on his final report card. These variables were 
measured by a question following each story: 


Suppose that Michael actually received mostly As 
and Bs on his final report card. To what extent 
were these grades due to (a) the fact that the exams 
given in his class were easy, (b) the amount of time 
Michael spent studying, (c) Michael's intelligence 
and ability to comprehend the class assignments, 
and/or (d) the fact that Michael was lucky in guess- 
ing on exams. 


Subjects rated the importance of each factor on a scale 
from 1 (not a factor in earning these grades) to 6 (a 
factor in earning these grades). 


Procedure 


Subjects were randomly assigned to 1 of 16 cells of the 
between-subjects design. In a written introduction to 
the study, they were informed that they would receive 
characteristic information available to teachers about 
their students and that this information was taken from 
a state-wide study of teaching. They were told that 
they would receive information about a specific student. 
and were asked to read the information as if they were 
that student’s teacher. Following these instructions, 

subjects read the initial and additional information and 
made their attributional judgments. They were in- 
structed not to return to any of the preceding pages of 
the booklet during the study. 
wise Study was conducted in six classes on 3 consec- 
OM Subjects absent during those days partic- 
E: luring 1 of 3 consecutive days of the following 
finished tes Tae ee time limit aea but all subjects 
Within a half hour. 
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Design and Data Analysis 


The study utilized a 2 X 4 X 4 design (Teaching Ex- 
perience X Reliability X Valence) with four dependent 
measures (attributions to the four causal factors). The 
two levels of teaching experience were teachers (subjects 
with teaching experience) and nonteachers (subjects 
with no teaching experience). A preliminary analysis 
of the data did not find a significant main effect for 
teaching experience or significant interactions of this 
variable with the other variables in the study. As a 
consequence, the data for teachers and nonteachers 
were pooled in subsequent analyses, 

The four levels of reliability were (a) reliable initial 
and additional information (+/+), (b) unreliable initial 
and additional information (—/—), (c) reliable initial 
information and unreliable additional information 
(/—), and (d) unreliable initial information and reli- 
able additional information (—/+). The four levels of 
valence were (a) positive initial and additional infor- 
mation (+/+), (b) negative initial and additional in- 
formation (—/—), (c) positive initial information and 
negative additional information (+/—), and (d) negative 
initial information and positive additional information 
(—/*). This design enables an examination of the ef- 
fects due to patterns of reliability and valence. For 
example, the effect of consistency of valence can be 
examined by comparing attributions to consistently 
positive information (+/+ condition), consistently 
negative information (—/— condition), and information 
of mixed valence (+/— and —/-- conditions). 

"The data were first examined using a multivariate 
analysis of variance (Finn, 1974) to determine the ef- 
fects of the two independent variables on subjects' at- 
tributions, Univariate analyses of variance were then 
conducted to determine which causal factors reflected 
significant effects in the multivariate analysis. The 
level of significance for the multivariate analyses was 
set at .05. In interpreting results of the univariate 
analyses, only those effects that corresponded to sig- 
nificant effects in the multivariate analysis were con- 
sidered, As with the multivariate analysis, the level of 
significance for the univariate analyses of variance and 
all post hoc comparisons was set at .05. 


Results 


Descriptive Data 


On the average, subjects viewed ability as 
the most important determinant of Mi- 
chael’s academic success (M = 4.91, SD = 
.95), followed closely by effort (M = 4.15, SD 
— 1.20). The difficulty of the exams was 
viewed as much less of a factor determining 
success (M = 2.96, SD = 1.44), and luck was 
seen as least important (M = 2.25, SD = 
1.18). This pattern of ratings supported 
Frieze and Weiner’s (1971) finding that 
people tend to attribute successful outcomes 
to internal or personal factors. 
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Examination of Treatment Effects 


A multivariate analysis of variance was 
used in the initial analysis of the data to de- 
termine whether there were overall differ- 
ences in patterns of attributions. In addi- 
tion to providing information about group 
differences in four-dimensional space, the 
multivariate analysis avoids the problems 
associated with conducting multiple univa- 
riate analyses of variance (cf. Finn, 1974). 
This analysis was performed using MULTIV 
(Finn, 1968), which computes F ratios using 
a transformation of Wilk’s lambda. As ex- 
pected, the analysis yielded a significant 
main effect for valence of information (F = 
3.01; df = 12, 360) and a significant interac- 
tion between valence and reliability of in- 
formation (F = 1.67; df = 36, 511). 

Four univariate analyses of variance were 
then conducted to determine whether the 
significant effects were localized in some 
subset of the four dependent variables. 
Univariate analyses of variance were sel 
as the appropriate statistical tests, since 
ratings of the four causal factors were not 
highly correlated (see Table 1) and since 
previous research within the attributional 
framework has examined each of these 
variables separately. 

Scheffé post hoc comparisons were then 
conducted for all significant main effects to 
identify significant differences among 
means. Tests of simple main effects were 
conducted for all significant interaction ef- 
fects, and Scheffé post hoc comparisons were 
computed for significant simple main effects 
to identify significant differences among 
means. To facilitate understanding and 
interpretation of the findings from these 
analyses, results for each causal factor are 
discussed separately. 

Attributions to ability. Attributions to 


Table 1 
Correlation Matrix for the Four Causal 
Factors: Ability, Effort, Task, and Luck 


Causal 
factor 2-1! plenior 188 1s 
1. Ability 
2. Effort 126 
3. Task —189 —.066 
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ability as an explanation for Michael’s suc- 
cess were expected to be greater for positive 
information than for negative information. 
In the former case, ability (an internal fac- 
tor) was a plausible cause of success; in the 
latter case, it was not. A significant main 
effect was found for valence of information 
(F = 4,46; df = 3,139). A Scheffé post hoc 
comparison of the mean ability rating for the 
—/— valence condition with the average of 
ratings for all other valence conditions sup- 
ported predictions for attributions to ability. 
This comparison showed that subjects rated 
ability significantly higher when any positive 
information was provided than when all in- 
formation was negative (F = 14.39; df = 3, 
139). The difference between attributions 
to ability when all information was positive 
(+/+) and when some information was pos- 
itive and some was negative (--/— and —/+) 
was not significant. Thus, subjects viewed 
ability as a more important determinant of 
Michael’s performance when information 
was positive than when it was negative. In 
addition, the teachers seemed to place more 
emphasis on positive information about a 
child than on negative information when 
making causal attributions to ability for that 
child's academic success (cf. Weiner & Peter, 
1973). 

The interaction between valence and re- 
liability of information was also significant 
(F = 2.59; df = 9, 139; see Figure 1). A 
simple main effects analysis showed that 
significant differences in attributions to 
valence of information occurred only when 
all information was reliable (F = 6.46; df = 
3,139). As predicted, subjects rated ability 
significantly higher when any reliable posi- 
tive information was provided (+/+, +/—, 
and —/+) than when all reliable information 
was negative (—/—) (F = 18.78; df = 3, 139). 
Thus, the influence of valence on teachers’ 
attributions to ability was greater when in- 
formation was reliable than when it was 
unreliable. 

Attributions to effort. A pattern of at- 
tributions similar to those for ability was 
expected for effort, the other internal causal 
factor. Again, the main effect for valence of 
information was significant (F = 5.75; df = 
3, 139). Scheffé post hoc comparisons in- 
dicated that the crucial determinant of at- 
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Figurel. Effects of reliability and valence of information on attributions to ability. (Levels of reli- 


ability are indicated by + for reliable and — for unreliable. The first position refers to initial information, 


while the second position refers to additional information. 


Levels of valence are indicated by + for 


positive and — for negative. Again, position refers to initial or additional information.) 


tributions to effort was the initial informa- 
tion provided to subjects. Attributions were 
significantly higher when initial information 
was positive (in the +/+ and +/— valence 
conditions) than when it was negative (in the 
—/ and —/— valence conditions) (F — 16.85; 
df = 3,139). The greater salience of initial 
information than additional information can 
be explained by a reexamination of the 
scenarios. The initial descriptions provided 
information about Michael’s effort (i.e., the 
amount of time Michael spends on home- 
Work), while additional information was not 
directly relevant to effort. Therefore, initial 
information but not additional information 
duificantly affected attributions to ef- 
o 


As in the case of attributions to ability, 
ere was a significant interaction between 
valence and reliability of information (F= 
2.01; df = 9, 139; see Figure 2). A simple 


tial information was positive, attributions to 
effort were consistently high. These attri- 


the reliability of initial information or be the 


reliability or valence of additional informa- 
tion. One possible explanation for this 
finding is that teachers may attribute stu- 
dents’ successful academic performance to 
student effort. Thus, when there is any ev- 
idence at àll that a successful student has 
tried, the teacher will rate effort as an im- 
portant determinant of that student’s suc- 
cess. In contrast, when initial information 
was negative, there were significant effects 
due to the reliability of information (in the 
—/— valence condition, F — 4.00, df — 3, 139; 
in the —/-- valence condition, F = 2.79, df = 
8, 139). These findings are not easily ex- 
plained by Kelley's (1973) attribution prin- 
ciples or by the hypothesis that teachers are 
sensitive to the reliability of information. 
Perhaps when information about effort (i.e., 
the initial information) was inconsistent with 
Michael's performance, factors not measured 
in this study, Such as individual differences 
in theories 9f teaching or educational beliefs, 
determined the importance teachers. at. 
tributed to effort as a cause of Michael's 
success. 


Attributions to luck. Luck, an external 
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Figure 2. Effects of reliability and valence of information on attributions to effort. (Levels of reliability 
are indicated by + for reliable and — for unreliable. The first position refers to initial information, while 


the second position refers to additional information. Levels of valence are indicated by + for positive 
and — for negative. Again, position refers to initial or additional information.) 


factor, was expected to be viewed as more 
important when information was negative 
than when it was positive. This prediction 
arises because negative information and 
successful performance are inconsistent, so 
that external rather than internal factors are 
plausible explanations for Michael’s success. 
The significant main effect for valence (F= 
2.96; df = 3, 139) supported this prediction. 
A comparison of the mean attribution to luck 
in the —/— valence condition with the aver- 
age of attributions to luck in all other valence 
conditions showed that luck was rated sig- 
nificantly higher when all information was 
negative than when any information was 
positive (F = 12.27; df = 3,189). Attribu- 
tions to luck did not differ significantly when 
the group receiving all positive information 
was compared to the mean of the groups re- 
ceiving some positive information (F <1.0). 
Apparently, the teachers only chose to use 
luck as an explanation for success when all 
information about a child was inconsistent 
with that child’s performance (i.e., when all 
information was negative). 

Attributions to difficulty of the exams. 


Attributions to difficulty of the exams, the 
other external causal factor, were also ex- 
pected to be higher when information and 
performance were inconsistent than when 
they were consistent. These predictions 
were not confirmed, as none of the three ef- 
fects led to significant differences in attri- 
butions to the difficulty of the exams. One 
possible explanation for this lack of signifi- 
cant differences is that no information about 
the exams was provided in the scenarios. 
Perhaps, in the absence of such information, 
the teachers were more willing to make in- 
ferences about luck than about exams. 
Thus, when faced with performance that 
could not be explained by internal factors, 
the subjects turned to luck rather than exam 
difficulty as a possible explanation for Mi- 
chael’s success. 


Discussion 


One purpose of this study was to deter- 
mine whether teachers’ explanations of a 
student’s academic success follow Kelley’s 
(1973) discounting and augmentation prin- 
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ciples. The discounting principle predicts 
that when internal factors are presented as 
plausible explanations for the student’s 
successful academic performance, these in- 
ternal factors will be seen as important 
causes of the behavior, and external factors 
will be discounted. The augmentation 
principle predicts that when internal causes 
are presented as inhibitory factors, these 
internal factors will be perceived as less im- 
portant and the role of external factors in 
producing the observed behavior will be 
augmented. Results of the study partially 
supported these predictions. When plau- 
sible internal causes were present (i.e., when 
positive information about Michael was 
provided), teachers discounted luck, an ex- 
ternal factor. Conversely, when lack of 
ability and lack of effort were presented as 
inhibitory factors, teachers attributed in- 
creased importance to luck and decreased 
importance to ability. 

Attributions to exam difficulty, however, 
did not follow the pattern predicted by 
Kelley's principles. "These attributions were 
not influenced by the variables manipulated 
in this study. One possible explanation for 
this finding is the absence of direct infor- 
mation about the exams. In this case, 
subjects may have been hesitant to make any 
inferences about exam difficulty as a cause 
of the student’s success, 

A second purpose of this study was to in- 
vestigate the influence of the reliability of 
information on teachers’ causal attributions, 
Subjects’ attributions to ability were sig- 
nificantly influenced by the valence of in- 
formation only when the information was 
reliable. Attributions to luck and exam 
difficulty, however, were not affected by the 
reliability of the information. Again, the 
fact that information about these external 
factors was not present in the Scenarios is one 
possible explanation for this finding. 

ne pattern of attributions to effort, 
particularly when information about effort 
was negative, is the most difficult to explain 
using either Kelley’s principles or the hy- 

n effect due to reliability of infor- 
mation. It may be that individual differ. 
ences among teachers, such as their theories 
about teaching or educational beliefs, are 
Particularly powerful determinants of 
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teachers’ attributions to effort as a cause of 
student success. 

These results, taken as a whole, suggest 
that teachers do use informational cues 
about a student and search for patterns 
among them in order to determine the im- 
portance of ability, effort, luck, and task 
difficulty in accounting for successful aca- 
demic performance. In utilizing these cues, 
teachers seem to employ multiple necessary 
and compensatory causal schemata and to 
take into account the reliability of informa- 
tion. 

If pedagogical decisions are, in fact, based 
on teachers’ attributions for student per- 
formance, these findings suggest ways of 
helping teachers to improve their classroom 
decision making. As a first Step, teachers 
could be made aware of the ways in which 
they explain student behavior (i.e., their 
attributions) and the implications of these 
attributions for making pedagogical deci- 
sions. This awareness would, hopefully, 
lead them to examine their personal decision 
strategies. In addition, teachers and 
teachers in training could be given practice 
in making instructional decisions. Teachers 
could be asked to state what information 

they use and how they combine this infor- 
mation when making decisions about 
grouping students, organizing class activities, 
assigning lessons, and so on. These stated 
decision policies could be compared to their 
Tesponses to written descriptions of students 
similar to the scenarios used in this study. 
Discrepancies between these actual policies 
and the stated policies could be reported 
back to the teachers. Feedback could focus 
on the teachers’ evaluation of the reliability 
of information and on the consistency of 
their attributional and decision strategies. 
Given this information, teachers could either 
revise their decisions to match their stated 
policies or review and alter their policies. 
Several characteristics of this study limit 
the generalizability of the findings. First, 
Scenarios themselves, while typically used in 
attributional research, represent a situation 
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further research in actual classroom situa- 
tions is needed to determine the settings in 
which the findings hold. 

In addition, several characteristics of the 
scenarios used in this study further limit the 
study's generalizability. In all scenarios, the 
student described was a fifth-grade boy who 
was academically successful. Future studies 
should include scenarios describing female 
students, students at grade levels other than 
fifth, and students with unsuccessful aca- 
demic performance to enable generalization 
of results to a broader student population. 

Future research should also systematically 
vary the valence and reliability of informa- 
tion about external causal factors (especially 
exam difficulty or task difficulty) in addition 
to varying information about internal fac- 
tors. This will provide a more direct test of 
predictions for attributions to external 
causal factors based on Kelley's (1973) 
principles and teachers' hypothesized sen- 
sitivity to the reliability of information. 
Finally, future research should include 
measures of individual differences in addi- 
tion to experimentally manipulated vari- 
ables in order to investigate the role these 
factors play in determining teachers' attri- 
butional patterns. 
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Learner Expectations Induced by Adjunct Questions ^ 
and the Retrieval of Intentional and Incidental Information 


Sabato D. Sagaria and Francis J. Di Vesta 
Pennsylvania State University 


There were 150 subjects who studied a passage with questions interspersed at 
different locations. Thirty different randomized sets of adjunct questions 
were yoked to subjects across treatments. Answers were written when the 
reading of a paragraph was completed. Intentional performance in the pre- 
questions treatment was lower than in the postquestions or combined pre/ 
postquestions treatments but higher than performance in the no-question 
treatment. Incidental performance was lowest in the prequestions treatment 
and eqüal in the remaining treatments. Total level of acquisition was highest 
in those treatments involving the use of postquestions and no questions. The 
results were attributed to the influence of adjunct questions on learner expec- 
tations that affect the selective processing of information. 


Some of the sources from which a learner 
may infer the demand characteristics for a 
given learning situation are derived from one 
or more expectancies (implied from a history 
of experience) regarding such facets of the 
classroom as the teacher, the type and or- 
ganization of material being studied, the 
objectives, and the instructional activity, 
including the type of tests used. The 
present study is based upon only one of these 
considerations, that is, learner expectations 
hypothesized to be differentially induced by 
an instructional activity that employs ad- 
junct questions, In particular, we assumed 
that variations in the application of adjunct 
questions have differential effects on ac- 
quisition because of the expectations they 
induce in the learner, which in turn, mediate 
the behaviors activated by various stimuli in 
the learning setting. Our framework is de- 
rived from a consideration of the mathe- 
magenic, cybernetic, and additive models 
regarding the effects of adjunct questions on 
learning from text. 
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A major part of this article is based on the first au- 
thor’s master’s thesis submitted to Pennsylvania State 
University. Portions of the thesis were presented at the 
annual meeting of the American Educational Research 
Association, New York City, April 1977. A j 

Requests for reprints should be sent to Francis J. Di 
Vesta, Division of Counseling and Educational Psy- 
chology, Pennsylvania State University, 421 Carpenter 
Building, University Park, Pennsylvania 16802. 


The mathemagenic view (Rothkopf, 1965, 
1966; Rothkopf & Bisbicos, 1967) assumes 
that adjunct questions differentially control 
the acquistion of intentional and incidental 
information through the relation of ques- 
tions to text. On the one hand, postques- 
tions control learning contingencies: Be- 
haviors that result in successful performance 
(ability to answer questions successfully) are 
enhanced; those that result in failure are 
extinguished. The learner's successful use 
of postquestions not only increases the 
probability that the next passage will be 
studied in a similar manner but also leads to 
an increase in the acquisition of incidental 
information when compared to the use of 
prequestions only. 

On the other hand, prequestions function 
as discriminative cues. They control the 
identification and achievement of informa- 
tion that the learner expects will be consid- 
ered important by the authority (teacher or 
experimenter) who sets the demands for a 
given study session. Acquisition of inci- 
dental information, because it is excluded by 
the questions, is therefore inhibited. Al- 
though Prequestions do not facilitate the 
acquisition of incidental information to the 
same extent as postquestions, they do facil- 
itate the acquisition of intentional infor- 
mation (Anderson & Biddle, 1975; Rickards, 
1976; Rothkopf, 1966). From this evidence, 
we reasoned that depressed learning of in- 
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cidental information (relative to the effect 
of postquestions) is due to the rejection of 
information unrelated to the questions, since 
by implication, the information is considered 
by the learner to be irrelevant to his or her 
immediate objectives (see Duell, 1974, for a 
further discussion of this point). 

The absence of adjunct questions may 
influence learner expectations by implying 
that since there is no definite basis for de- 
termining what is important, some amount 
of it should be learned, depending upon what 
the situational cues imply. Compared to the 
direction provided by questions, the demand 
characteristics of the no-question learning 
situation remain somewhat ambiguous. 
Only when the demands aroused by ques- 
tions direct the student to information he or 
she would not ordinarily consider important 
under no questions will learning be facili- 
tated by the use of questions or objectives 
(Duell, 1974). This view indicates that a 
no-question treatment must be incorporated 
into each experiment on adjunct questions 
in order to control for unforeseen demands 
and their associated expectations. In the 
present study, for example, we assumed that 
the effect of the combined use of pre- and 
postquestions may be no better than the use 
of no questions if in the experimental situa- 
tion, the expectancy was induced that all of 
the information was important. 

From the aforegoing analysis, it can be 
seen that whatever the treatment, the ma- 
themagenic position raised important the- 
oretical questions regarding what expec- 
tancies are developed by the learner as a 
consequence of the use of adjunct questions 
and how the learner adapts to these expec- 
tancies when meeting task requirements. 

The cybernetic framework (Frase, 1969), 
which deals with feedback control processes, 
is consistent with the mathemagenic ap- 
proach. Presumably, an adjunct question 
is employed by the learner as a study guide. 
Confronted with a prequestion (or the pos- 
sibility of a postquestion), the learner reads 
the text to find the answer to the question (or 
to the anticipated question). With positive 
feedback (i.e., the correct answer), a suc- 
cessful strategy is adopted and maintained 
in use. Failure to obtain a correct answer 
generates an error signal (negative feed- 
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back). The error signal directs the learner 
to alter the strategy applied to subsequent. 
paragraphs in order to increase the proba- 
bility of obtaining positive feedback in the 
form of meeting some externally or inter- 
nally imposed criterion. On the basis of 
evidence from this framework, our rationale 
was extended to indicate that adjunct 
questions become effective stimuli by 
evoking expectancies that guide the selection 
of specific learning strategies. 

The third orientation to the use of adjunct 
questions is the relatively more recent ad- 
ditive model (Boyd, 1973). It considers the 
effect of adjunct questions on two mental 
operations: attention and retention. At- 
tention is the process of “putting informa- 
tion into some form of storage and is opera- 
tionalized as immediate or nearly immediate 
recall of information" (Boyd, 1973, p. 31). 
Retention means “to represent either or both 
the storage of information or the retrieva- 
bility of the material from storage over time" 
(Boyd, 1973, p. 31). Furthermore, “if pre- 
questions increase a subject's attention for 
intentional material and postquestions re- 
tard the rate of forgetting for material at- 
tended to, then the effect of giving a set of 
prequestions with identical postquestions 
should be to increase intentional posttest 
scores more than a set of pre- or postques- 
tions alone” (Boyd, 1973, p. 32). We have 
reinterpreted this statement to imply that 
prequestions influence expectancies that 
facilitate intentional learning and post- 
questions influence expectancies that facil- 
itate incidental learning. After careful 
consideration of this view, we saw no reason 
why the two sets of questions had to be 
identical if additive effects were to be iden- 
tified. 

Based on these several orientations, our 
major hypothesis was that relative to a 
reading-only (no question) treatment (NoQ), 
the effect of pre- and postquestions in com- 
bination (QBA) would be equivalent to the 
combined independent effects of preques- 
tions only (QB) plus postquestions only 
(QA). Since the same adjunct items were 
used across treatments, this hypothesis can 
be restated as (QB — NoQ) + (QA — NoQ) = 
(QBA — NoQ). Criterial performance was 
defined in terms of both incidental and in- 
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tentional test scores. 

To test this hypothesis adequately re- 
quired several methodological provisions, 
one or more of which had been neglected in 
previous studies. These provisions were as 
follows: bikes 

1. In order to enhance the generalizabi- 

lity of results to a general college population, 
a heterogeneous sample of college students 
from a variety of sources was employed. In 
the past, sampling was often restricted to a 
homogeneous target population (for exam- 
ple, college sophomores in introductory 
psychology, office workers, or paid volun- 
teers). 

2. In order to control for situational de- 
mand effects, a no-question control group 
was used. 

3. Procedures for developing an item 
pool were defined in terms of the domain of 
knowledge in the text, as represented by the 
number of sentences. Each sentence was 
assumed to convey a single idea. 

4. The items for the adjunct questions 
and for the criterion tests were selected at 
random from the item pool, so that 30 dif- 
ferent forms of the test were employed and 
randomly administered to subjects within a 
treatment. Each form was represented 
across treatments. 

5. The placement of adjunct questions 
within the text was determined by the 
amount of information (seven sentences) 
preceding or following a given item. Place- 
ment was standardized across treatments. 

6. In order to provide for comparison of 
results with other studies, appropriate rep- 
lication treatments were incorporated into 
the design. 


Method 
Design 


The experimental design implied the use of a mixed 
analysis i i 


subjects factor consisting of three levels of question 


Subjects 


The subjects 


volunteer a (N = 150) were a heterogeneous group of 


m among undergraduates matriculated 
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ata land grant eastern university. Some (n = 13) came 
from an introductory educational psychology course for 
which course credit could be earned by participating in 
the study. The remaining subjects (n = 137) were 
students living in the university residence halls and were 
not enrolled in the introductory educational psychology 
course. Each subject was randomly assigned, by ref- 
erence to a table of random digits, to one of the five ex- 
perimental treatments, with the restriction that ran- 
domization was recycled at N + 1 treatments. There 
were 30 subjects in each treatment. 


Test Items 


Each main sentence idea was incorporated into a 
question representing such topics as names, measures, 
technical terms, or relationships. The resulting 70 
questions comprised the domain of all items represented 
in the passage. 

"Through the use of computer text processing pro- 
grams, 40 questions were randomly selected from the 
domain of items for the adjunct questions and for the 
postreading criterion measure. A set of 40 items con- 
tained equal representation (n — 4) from each para- 
graph. The randomization procedure specified the 
random number generator to duplicate questions across 
treatments. The order of the questions in the criterion 
measure was further randomized to reduce the proba- 
bility of serial order or other artifactdal effect. This 
procedure resulted in a total of 30 different forms of the 
criterion test. Subjects within any one condition re- 
ceived unique sets of questions, but the same set of 
questions appeared in each condition, thereby providing 
a replication of items across treatments. 


Experimental Conditions 


A meaningful passage of approximately 800 words on 
vitamins was divided into 10 paragraphs, each of which 
was comprised of 7 sentences. Each sentence contained 
1 main idea. Open-ended questions, which could be 
correctly answered with 1 to 3 words, were constructed 
for each of the 70 sentences. 

In order to assure equivalent attention, the subject 

rewrote the question on a separate sheet immediately 
after it was read. The question was answered, again on 
a separate sheet, immediately after a given paragraph 
was read. The question was not repeated. Thus, in the 
prequestions treatment, the subject read the question, 
rewrote the question, read the paragraph, and then 
answered the question. For the postquestions treat- 
ment, the subject read the paragraph, read the question, 
rewrote the question, and answered the question. In 
the question before and after (QBA) treatment, these 
two procedures were combined. Each operation was 
on a separate page in each treatment. The subject was 
directed not to look back to pages already read or for- 
ward to new pages. The specific treatments were as 
follows: 
. Test only. Subjects in this treatment were admin- 
istered only the criterion test. Asa control, the test was 
intended to obtain a base against which the perfor- 
mance of other groups could be compared. 


| 


"x 


LEARNER EXPECTATIONS INDUCED BY ADJUNCT QUESTIONS 


No question (NoQ). This was another control for 
determining the effects of merely reading the passage. 
Subjects in this treatment studied the material without 
the aid of adjunct questions. 

Preadjunct questions (PreQ). The subjects assigned 
to this treatment studied the passage in which each 
paragraph was preceded by one randomly selected 
question from the respective subpool of questions. 

Postadjunct questions (PostQ). The subjects as- 
signed to this treatment read the passage in which a 
question from each paragraph was placed after its re- 
spective paragraph. 

Combined pre- and postadjunct questions (QBA). 
The subjects in this treatment read the passage in which 
one question was placed before and a different question 
was placed after each paragraph. 


Procedure 


Students volunteered to participate in the study at 
their convenience. An average of seven subjects par- 
ticipated during any one session. They reported to a 
central area where general instructions were delivered 
by the experimenter. Upon answering questions, the 
experimenter provided treatment-related materials 
(which were also self-administered) for study in a place 
of the subject’s own choice. The role of the test-only 
subjects was explained separately to them. 

Upon completing the study of the material and after 
it was returned to the experimenter, the criterion test. 
was administered to the subject. At the conclusion of 
the experiment, the subject was interviewed to deter- 
mine whether (a) his or her role in the task was under- 
stood, (b) directions were followed, and (c) progression 
through the stimulus material was in a forward direction 
only. Ifa “no” answer to any question was suspected, 
the subject was disqualified, resulting in the elimination 
of four subjects from the data analyses. 


Results 


The a priori rejection region for all anal- 
yses reported below was set at p < .05, unless 
otherwise specified. All data are reported 
in terms of percentage of correct re- 
sponses. 


Analyses of Criterion Measures 


The initial analysis was made to deter- 
mine whether the 30 forms of the test, each 
of which was based on a random selection of 
items from the total pool, were equivalent. 
A 30 X 5 (Test Forms X Treatments) factor- 
ial analysis of variance was made separately 
of intentional, incidental, and total scores. 
All statistical tests indicated that the hy- 
pothesis of equivalent criterion measures 
could not be rejected (all ps > .58). Since 
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the hypothesis could not be rejected on the 
basis of this analysis, it was concluded that 
the assumption of equivalent tests was via- 
ble. Accordingly, any significant effects 
identified in subsequent analyses could be 
attributed to treatments. 

A second analysis was made to obtain an 
estimate of the base level of information on 
vitamins known by the subject population. 
The test-only subjects’ scores (M = 16.6) 
were significantly lower than the scores for 
the remaining treatments combined (M = 
61.2). These data indicate that only a min- 
imal number of ideas in the passage were 
known to or could be guessed at by the target 
population. Any change from this base 
knowledge was attributed to study of the 
passage and/or to the experimental treat- 
ment. 


Intentional and Incidental Learning 


The primary analysis was a 2 X 4 mixed 
analysis of variance. The between-subjects 
variable was question placement (PreQ, 
PostQ, QBA, and NoQ). The within- 
subjects variable was the kind of criterion 
measure (incidental and intentional). Al- 
though subjects in the NoQ condition did not 
encounter intentional items as such, an 
equivalent score was obtained for compari- 
son purposes on those items employed as 
measures of intentional learning in the other 
treatments. The means for all treatments 
are summarized in Table 1. 

The analysis yielded significant main ef- 
fects for question placement, F(3, 116) = 
4.39, and kind of measure, F(1, 116) = 
151.22. The interpretation of these effects 
must be qualified, since the interaction be- 
tween type of learning and placement of 
questions yielded F(3, 116) = 19.96. The 
Newman-Keuls follow-up procedure was 
used to test for simple effects. The follow- 
up tests indicated that performance on the 
intentional measure was significantly lower 
under the PreQ treatment than under either 
the QA and QBA treatments but higher than 
the NoQ treatment. The PreQ treatment 
produced significantly lower incidental 
learning (M = 40.6) than all other treat- 
ments combined (M = 59.1), none of which 
was significantly different from the others. 
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Table 1 
Mean Percentages of Correct Responses for 
Intentional and Incidental Learning by 


Subjects in Four Adjunct Question Conditions 


Adjunct Kind of criterion measure 
question Intentional* 

condition PreQ PostQ Incidental 
NoQ 63.7 64.3 64.2 
PreQ 69.7 -— 40.6 
PostQ Ho 79.0 58.6 
QBA 78.0 11.8 60.2 


Note, NoQ = no question, PreQ = preadjunct questions, PostQ 
= postadjunct questions, and QBA = combined pre- and 
postadjunct questions. 

a The PreQ intentional score is based on items in the criterion 
test that duplicate adjunct questions used in the PreQ condition 
and the question before the paragraph in the QBA adjunct 
question conditions. Conversely, the PostQ intentional score 
is based on items in the criterion test that duplicate adjunct 
questions used in the PostQ condition and the question after 
the paragraph in the QBA adjunct question conditions. 


There was no significant difference between 
the NoQ subjects’ performance on either 
intentional and incidental items (M = 64.1) 
or the PreQ subjects’ performance (M = 
69.7) on intentional items. In general, per- 
formance on intentional items was signifi- 
cantly higher than performance on inci- 
dental items in all but the NoQ treatments. 
(We wish to reemphasize that because the 
NoQ treatment encountered only incidental 
items, there should be no difference between 
the incidental and intentional scores for that 
hes assumption supported by our 
ata. 


Additivity 


Since the NoQ subjects' intentional and 
incidental scores were, in fact, estimates of 
the same performance, they accounted in 
large measure for the significant interaction 
obtained in the immediately preceding 
analyses. Accordingly, separate analyses 
were made by partitioning the scores ac- 
cording to PreQ (present or absent) and 
PostQ (present or absent) to determine 
whether the interaction was significant as 
one test of the additivity model. 

The analysis of intentional scores failed to 
yield a significant interaction, F(1, 116) < 
1.00, thereby providing partial evidence that 
the effect of question placement (PreQ and 
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PostQ) was additive. Although the differ- 
ence between the obtained mean (M = 77.8) 
and the expected mean (M = 84.62) was not 
significant, the obtained QBA mean was no 
higher than that of the mean for the PostQ 
treatment (M = 79.00) alone. Thus, the 
effect of the QBA treatment on intentional 
learning appears only to approximate the 
effects of the additive combinations of PreQ 
and PostQ treatments, thereby providing 
only fragile support for the additive 
model. 

The analysis of incidental learning scores, 
in a further test of the additivity model, 
yielded a significant effect due to PostQ 
(either alone or in combination with PreQ), 
F(1, 116) = 16.24, and a significant interac- 
tion of PostQ and PreQ, F(1, 116) = 31.07. 
Follow-up tests indicated that the PreQ 
performance was significantly lower than 
that of the PostQ treatment, whereas the 
QBA treatment was not significantly higher 
than the remaining groups, that is, NoQ and 
PostQ (see Table 1). Parallel results were 
obtained in the analysis of total scores, ex- 
cept that only the PreQ performance was 
significantly different (lower) than each of 
the other scores. 

"These data imply an approximation to the 
additive model for intentional but not for 
incidental learning. Obviously, the detri- 
mental effects associated with prequestions 
in learning incidental material are elimi- 
nated by the concomitant use of postques- 
tions. 


Effects of Adjunct Questions on 
Attention and Retention 


The relation of adjunct questions to at- 
tention and retention (Boyd, 1973) was ex- 
amined by considering the answers to the 
adjunct questions written at the time of their 
encounter in the study period. The data 
were analyzed via a mixed analysis of vari- 
ance with three levels of the between- 
subjects factor (PreQ, PostQ, and QBA) and 
two within-subjects levels of responses (score 
on adjunct questions and on the criterion 
test). This analysis yielded a significant 
interaction of treatments and kind of mea- 
sure, F(2, 87) = 4.08. Newman-Keuls fol- 
low-up tests indicated that the difference 
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BLOCKS OF PARAGRAPHS 
Figure 1. 


BLOCKS OF PARAGRAPHS 


Incidental and intentional scores on first half versus second half of text for each experimental 


treatment. (NoQ = no question, PreQ = preadjunct questions, PostQ = postadjunct questions, QBA 
= combined pre- and postadjunct questions, and test = test only.) 


among the three treatments were not sig- 
nificant for scores on the immediate recall 
measure. 

As would be expected, overall perfor- 
mance on the criterion measure was signifi- 
cantly lower than immediate recall, F(1, 87) 
= 118.97. Of particular interest, however, 
was the performance (total score) resulting 
from the PreQ treatment: It was signifi- 
cantly lower than that resulting from either 
the PostQ or QBA groups, which in turn, 
were not significantly different from each 
other. 

The results of this analysis indicate that 
attention at the time of learning was equal 
among the three treatments; subjects had 
stored essentially the same information in all 
treatments involving questions. Differences 
among treatments on the criterion test, 
therefore, reflect differences in the kind and 
extent of encoding at the time of learning 
and of accessibility at retrieval. Although 
the present study was not designed to pro- 
vide an answer to what kind of processing 
was employed, these data suggest that PreQ 
treatments may lead to narrowly focused or 
possibly rote (verbatim) processing; whereas 


PostQ and QBA lead to more broadly fo- 
cused or more meaningful learning, a point 
that will be discussed in more detail in the 
Discussion section. 


Serial Effects of Questions 


An analysis was made of criterion scores 
for the first block of five paragraphs com- 
pared to learning on the last block of five 
paragraphs of the passage as measured by 
the respective parts of the criterion test. 
This analysis was conducted to determine 
whether the treatments differentially af- 
fected the learner’s processing strategies as 
the passage was being studied. The data are 
graphically displayed in Figure 1. 

The data were analyzed via a 5 X 2 X 2 
analysis of variance. The between-subjects 
variable was comprised of the three treat- 
ments and two controls. The within- 
subjects variables were serial position of 
paragraphs (blocks of paragraphs) and kind 
of criterion measure (incidental or inten- 
tional): In addition to the significant main 
effects of treatments, of the kind of criterion 
measure, and of the significant interaction 
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of these two variables (all of which have been 
described above), the analysis yielded a sig- 
nificant interaction of kind of measure and 
serial position, F(1, 145) =6.48. The latter 
interaction reflected mean differences (Mp) 
in intentional learning (Mp = +1.33) and in 
incidental learning (Mp = —3.77) over blocks 
of paragraphs. Separate 2 X 2 (Blocks of 
Paragraphs X Kinds of Measure) analyses 
for each treatment yielded a significant in- 
teraction of the two variables for the PostQ 
treatment, which yielded F(1, 29) = 10.01, 
thereby reflecting an increase in intentional 
learning (Mp = 4.6) and a decrease in inci- 
dental learning (Mp = —10.2) over blocks of 
paragraphs. Thesame trend was found for 
the combined (QBA) treatment, although it 
was not significant; the mean difference be- 
tween blocks of paragraphs was a decrease 
of 5.096 for incidental and an increase of 1.096 
for intentional learning. The analyses for 
the PreQ and NoQ treatments failed to yield 
significant interactions, indicating that the 
performance on intentional items relative to 
performance on incidental items remained 
constant over blocks of paragraphs. The 
PreQ treatment resulted in a decrease of 
2.7% for incidental and an increase of 2.0% 
for intentional learning; the NoQ treatment 
increased .5% from Block 1 to Block 2. 
These data indicate that the regular use of 
prequestions leads immediately and con- 
sistently to a strategy for recalling inten- 
tional information, whereas the no-question 
treatment leads to a relatively immediate 
and consistent strategy for recalling inci- 
dental information. The regular use of 
postquestions (and to a lesser degree, com- 
bined pre/postquestions) leads immediately 
to a strategy for recalling incidental infor- 
mation (that is about as effective as whatever 
Strategy was used by the no-question group) 
and then changes to a strategy for recalling 
intentional information (see Figure 1). 


Study Time 


Total study time was measured from the 
time the subject started to study the text to 
the time he or she completed the study pe- 
riod. Included in the measure were such 
activities as time to rewrite each question, 
time to answer each question, processing 
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time, as well as time taken to study the text. 
As a result, a “pure” measure of study time 
(i.e., without time taken to write questions 
and answers) was unavailable. Average 
means for study time were 14.35 minutes, 
19.13 minutes, 23.27 minutes, and 34.33 
minutes for the NoQ, PreQ, PostQ, and QBA 
treatments, respectively. These means are 
ordered about as expected, since no ques- 
tions were answered or rewritten by the NoQ 
group and two questions had to be rewritten 
and answered by the QBA group. Because 
of this contamination, the form of the time 
data prohibited a meaningful analysis of 
their implications. Thus, further interpre- 
tations of the relation between study time 
and performance were not made. 


Discussion 


The condition of this experiment that re- 
quired the subject to write an answer to the 
adjunct question at the time of its appear- 
ance closely approximates the condition 
under which typical study takes place and 
provides a control that the learner will at- 
tend to the material. Obviously, if an ad- 
junct question is placed within the text, the 
intent of the author is that the student will 
use the question by providing an answer. 
For our purposes, the answer was evidence 
that the question was attended to as it was 
encountered and that the subjects knew the 
answer equally well in all conditions at the 
time the paragraph was read. Thus, the 
effects on the criterion test were those ef- 
fects of adjunct questions after initial at- 
tention was controlled. 

The most general conclusion to be drawn 
from this study is that the use of adjunct 
questions adds to the retention and retrieval 
of items represented in the intentional score. 
But when they appear before the paragraph 
to be read, adjunct questions significantly 
interfere with retrieval/retention of inci- 
dental material—a clear disadvantage. 
Even here, PreQ adds less to intentional 
learning than does PostQ. This is not an 
unprecedented result. A similar effect was 
reported by Boyd (1973), who found 44% 
correct recall for PreQ versus 48% correct 
recall for PostQ. Boyd also reports 14% 
forgetting for PostQ versus 33% forgetting 
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for PreQ. Both Boyd’s and the present re- 
sults support the hypothesis that the PreQ 
condition enhances selective attention but 
not selective retention. A further explana- 
tion of the results is that PreQ provides no 
further guides (expectations) to the impor- 
tance of the material than is already implied 
in the total experimental situation (NoQ) 
without the discriminative cues provided by 
adjunct questions. This rationale incorpo- 
rates and is consistent with both Boyd’s 
(1973) and Duell’s (1974) explanations of 
their results. ] 

The QBA treatment adds significantly to 
intentional learning that might be achieved 
by PreQ but no more than that achieved by 
PostQ. But QBA has no distinct advantage 
over PostQ or NoQ for incidental learning. 
Invoking Boyd's explanation, we note that 
attention in both PostQ and QBA is the same 
as in the control condition, but selective re- 
tention of intentional material (tapped by 
the postquestion) is enhanced, while the 
forgetting rate of incidental materials is the 
same as that for controls. Furthermore, 
PostQ, used frequently throughout the text 
and at regular intervals, provides salient 
discriminative cues that tend to increase 
intentional learning over time at the expense 
of incidentallearning. These cues become 
guides for expectations related to what is 
important (and should be remembered) and 
to what is unimportant (and can be forgot- 
ten). These results also support Ausubel's 
(1963) contention that the more the material 
is broken by questions, the more incidental 
learning is inhibited; whereas frequent 
questions tend to improve retention of rel- 
evant (intentional) material (also see Frase, 
1968; Rickards & Di Vesta, 1974). 

In comparison to earlier studies, there 
were some methodological differences in the 
present study that may account for our 
failure to provide more than modest support 
for the additivity model. Rather than em- 
ploy a prequestion and then provide an an- 
swer to the same question as a review 
(Bruning, 1968), our subjects wrote their own 
answers to the questions in all treatments, 
since we were concerned with controlling the 
subject's attention. Similarly, the procedure 
of employing identical questions in both the 
prequestions and the postquestions for the 
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combined treatment (Boyd, 1973) was only 
approximated in our PreQ treatment where, 
although the question was not physically 
present, the presence of the question at the 
end of the paragraph could only be implied, 
since the question was answered but not re- 
written. Finally, different questions were 
employed in both the pre- and postpositions 
of our QBA treatment (i.e., they were not 
identical as in the earlier studies). Although 
the manipulations induced in the present 
study requiring the subjects to answer the 
questions as they were encountered provided 
reasonable assurance that questions received 
the subject's attention, the control of at- 
tention may also have counteracted strong 
additive effects, since additivity in earlier 
studies was assumed to be the result of dif- 
ferences in selective attention and selective 
retention induced by various experimental 
treatments. Attentive reading of a passage 
coupled with the effects of either or both pre- 
and postadjunct questions interspersed 
throughout the text do combine, but in 
complex ways, to contribute differentially to 
incidental and intentional learning. 

The results of this study allow an exten- 
sion of the rationale, provided in the intro- 
duction, related to “intentional” and “di- 
rected" forgetting (Bjork, 1972). Applying 
this orientation to the results of our various 
treatments, it can be seen that at one ex- 
treme are the expectations implied by the 
NoQ treatment that much of the material, 
but not all, ought to be attended to and 
tagged for retrieval. At the other extreme 
is the expectation implied by PreQ that only 
the answer to a given question is demanded 
by the experimenter (text author) and 
therefore requires attention, but even that 
answer is not tagged as sufficiently impor- 
tant for delayed recall. Accordingly, PreQ 
does not necessarily enhance acquisition and 
must seem redundant to the subject when 
compared with a NoQ control. Expectations 
developed by NoQ are dependent on all 
facets of the environment external to the 
experimental conditions and therefore pro- 
vide a base level of performance (control) 
against which the effect of treatments may 
be judged. PostQ can be viewed as inducing 
changing expectations. Initially, PostQ will 
induce the same expectations as NoQ. With 
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increasing experience, adaptation to PostQ 
requires the development of a new expecta- 
tion that the questions asked (and answered) 
provide a criterion for the kind of informa- 
tion to be selected, encoded, and tagged for 
later recall. : 
In conclusion, the conditions of experi- 
ments on adjunct questions induce expec- 
tations among which are those associated 
with directed forgetting. As voluntary 
processes, rather than control the mainte- 
` nance of information in long-term store, they 
exert considerable control over what can be 
retrieved from long-term memory through 
“recoding activities, which associate some 
items rather than others with retrieval cues 
that may be operative in test situations and 
by influencing the nature of memory search 
processes" (Estes, 1976, p. 9). A fresh per- 
spective for further research on adjunct 
questions is thereby provided. It raises 
questions not only of what control groups to 
use but also of the differential influences on 
encoding of varying combinations and kinds 
of adjunct questions, of the kinds of encoding 
they produce, of ways to maintain enduring 
control of incidental learning, and of how 
they control the "updating" of information 
in memory (Estes, 1976)—all of which are 
questions related to differential control of 
depth of processing and of availability and 
accessibility of information at retrieval. 
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Level I - Level II Abilities as They Affect Performance 
of Three Races in the College Classroom 


Langdon E. Longstreth 
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It was hypothesized that multiple-choice exams load higher on Level II ability 
(g) than on Level I ability (rote memory). Study 1 confirmed this notion by 
correlating multiple-choice exams with the Cognitive Abilities Test, Nonver- 
bal Battery (CAT; r = .35), an index of Level II, and the forward digit span test 
(FDS; r = .17), an index of Level I. Large racial differences were also found, 
with black and Mexican American students (ns = 20 and 31) scoring signifi- 
cantly lower than white and Asian students (ns = 85 and 42) on multiple- 
choice exams and CAT but not on FDS. Study 2 showed that essay tests are 
more highly correlated with multiple-choice tests (r = .68) than either test 
score is with true-false tests (r = .37 and .39), and that white and Asian stu- 
dents (ns = 48 and 18) scored significantly higher on essay and multiple choice 
than black students (n = 30) but did not differ from them on true-false. 


These results successfully extend Jensen’s theory to the college classroom. 


This study explores the interaction be- 
tween racial membership and type of test in 
academic achievement of college students. 
Jensen's Level I — Level II theory of mental 
ability is used as a guide as to what one might 
expect, and the theory is evaluated in the 
light of the data. The theory has not here- 
tofore been applied to the classroom per- 
formance of college students. 

Jensen's theory is based on evidence that 
different kinds of ability are differentially 
distributed as a function of socioeconomic 
background and/or race. It has been de- 
scribed and presented in detail elsewhere 
(Jensen, 1969; 1973, pp. 193-293; 1974). 
Briefly, Level I ability consists of primary 
memory. It is reflected in rote learning, 
paired-associate learning, free recall of un- 
categorized lists, serial learning, and so on. 
Level II ability consists of mental operations 


‘These studies could not have been carried out without 
the cooperation of several people. Jane Bechtol and 
Virginia Wong scored many of the essay examinations. 
Scott Fraser provided multiple-choice test data from 
his course in introductory psychology. Dan Kee pre- 
pared the forward digit span test. Last but not least, 
students in the author’s child development course 
provided the raw data of Study 2. To these colleagues 
and students, I say “thank you.” 

Requests for reprints should be sent to Langdon E. 
Longstreth, Department of Psychology, University of 
Southern California, Los Angeles, California 90007. 


such as elaboration, reasoning, and catego- 
rization. It is reflected in tests of general 
intelligence that have a high g loading, such 
as the Stanford-Binet and Raven's Pro- 
gressive Matrices. Level I ability is often 
measured in terms of learning (recall) of new 
material, while Level II ability is often 
measured in terms of operations on old, 
previously learned material. This dichot- 
omy is not absolute, however; Level II ability 
can also be reflected in new learning by the 
way in which the new information is stored. 
If it is stored serially, for example, or in rel- 
atively unelaborated form, Level I ability 
would be implied; if it is stored categorically 
or in some elaborated form, Level II ability 
would be implied. 

Jensen’s theory is of particular relevance 
to the black-white difference in intelligence 
test scores because it reflects the evidence 
that the racial difference resides primarily 
in Level II differences not Level I differ- 
ences. Thus, in 1974, Jensen reported a 
study of virtually all black and white stu- 
dents in Grades 4, 5, and 6 of the Berkeley 
Unified School District in Berkeley, Cali- 
fornia (Ns = 1,123 black and 1,489 white). 
Level I ability was measured by a digit span 
memory test and Level II ability by the 
Lorge-Thorndike Intelligence Test (Level 
3, Form B), a test primarily of reasoning 
ability. Blacks scored significantly lower 
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than whites on both tests, and the difference 
was approximately twice as great (in sigma 
units) for Level II as for Level I scores. So- 
cioeconomic differences were also associated 
with larger Level II differences than Level I 
differences. It was not possible to com- 
pletely separate racial and socioeconomic 
effects, however, so that an estimate of racial 
differences independent of socioeconomic 
background was not obtained. 

One of the insights provided by the theory 
is to explain what has often been taken as 
evidence of test bias against minority groups. 
Tt has been observed by many teachers of the 
disadvantaged, and the author has noted it 
with disadvantaged college students too, that 
in certain situations, these students seem 
much brighter than their IQ scores or 

achievement scores would lead one to expect. 
Jensen writes, “A lower-class child coming 
into a new class, for example, will learn the 
names of 20 or 30 children in a few days, will 
quickly pick up the rules and the know-how 
of various games on the playground, and so 
on—a kind of performance that would seem 
to belie his IQ, which may even be as low as 
60. ‘This gives the impression that the test 
is ‘unfair’ to the disadvantaged child” 
(Jensen, 1969, p. 111). The explanation, of 
course, is that the learning of names and 
rules is a product of Level I ability. The IQ 
score is inaccurate in predicting Level I ac- 
tivities, and the teacher who is not aware of 
this fact calls it “unfair.” Once the dis- 
tinction is made, the IQ scores become fair 
for predicting Level II activities, and other 
tests such as digit span become fair for pre- 
dicting Level I activities. 

Our present concern applies the theory to 
the college level in an attempt to understand 
achievement differences between races in 
that age group. Consider two commonly 
used assessment techniques in college 
courses: true-false and essay examinations. 

A true-false question usually contains a 
simple statement of fact that means the 
same as or means the opposite of some 
statement in the textbook or lecture. As 
such, successful performance seems to rely 
mainly on retention of meaning of the orig- 
inal statement. While Level II ability may 
well be involved in the storage of many such 
meanings, it would not seem to be required: 
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Repetitive rehearsal, for example, a Level I 
activity, might be sufficient to transfer such 
meanings into a relatively permanent storage 
register (cf. Craik & Lockhart, 1972). An 
essay question, on the other hand, often de- 
mands comparison, evaluation, application 
of old facts to new situations, and so on. 
"These activities would seem to require Level 
II ability. If this analysis is at all accurate, 
Jensen's theory would lead us to expect 
greater black-white differences on essay 
examinations than on true-false examina- 
tions. Other racial differences on such ex- 
aminations, such as frequency of non- 
standard phrases, would have to be con- 
trolled, of course, in order to apply Jensen's 
theory. The present study contains such 
controls. 

Multiple-choice questions are a third 
frequently used assessment technique. 
They are a bit more problematical with re- 
gard to Level I — Level II classification. On 
the one hand, they can consist of one bla- 
tantly correct statement and several bla- 
tantly incorrect statements, in which case, 
Level I ability would seem sufficient to arrive 
at the correct answer. On the other hand, 
they can consist of several correct state- 
ments, but only one of which is correct in 
view of the setting provided by the stem 
statement. The following item, taken from 
the “Instructors’ Manual” for Longstreth’s 
(1974) child development textbook, is of this 
type: 


One explanation for the difference in stability of ag- 

gressive behavior from childhood to adulthood for males 

and females is that 

(1) males are more aggressive, on the average. 

(2) females are more non-aggressive, on the average. 

(3) females receive more pressure to abandon aggressive 
behavior as they approach adulthood. 

(4) all of these. (p.49) 


In this question, Alternatives 1, 2, and 3 
are all correct when considered alone, but 
only Alternative 3 leads logically to the 
statement in the stem. Thus, some modi- 
cum of Level II ability would seem to be re- 
quired to evaluate the logical relationships 
between the stem phrase and the response 
alternatives. A good many multiple-choice 
questions seem to be of this type. In terms 
of face validity, then, we tentatively place 
multiple-choice questions somewhere in 


= 


4 


E 


———— — 


— v 


LEVEL I - LEVEL II ABILITIES 


between true-false and essay questions in 
terms of their Level II loading. We can 
check this placement by studying the inter- 
correlations between the different test 
scores. If essay tests are taken as a Level II 
anchor point, then the relative Level IT 
loading of multiple-choice and true-false 
questions can be determined by their cor- 
relations with essay scores. 

It would also be desirable to establish the 
relationship between these test forms and 
Level I- Level II ability by more direct 
means. To that end, Study 1 reports the 
relationship between multiple-choice scores 
and “outside” indexes of Level I and Level 
II ability. Level I is indexed by a forward 
digit span test and Level II by the Nonverbal 
section of the Cognitive Abilities Test. 
Jensen (1974) has indicated that these two 
tests provide about as pure a measurement 
of the two abilities as it is possible to ob- 
tain. 


Study 1 


Method 


Procedure. Introductory psychology students, in 
groups of about 50, were administered a forward digit 
span test (FDS) and the Cognitive Abilities Test (CAT) 
as part of a course requirement. The FDS consisted of 
tape-recorded digit presentation at the rate of ap- 
proximately one digit per sec, beginning with three 
digits and ending with nine. After each series, the word 
“now” occurred, which was the cue to write down as 
many digits as possible in the order of their occurrence. 
After a 10-sec response period, “ready” was announced, 
followed by the next series. 

The CAT Nonverbal Battery, Level H (Thorndike 
& Hagen, 1974) was administered to obtain a mea- 
surement of Level II ability. The CAT, published in 
1974, is a successor to the well-known Lorge-Thorndike 
intelligence tests. The Nonverbal Battery consists of 
three subtests, the items of which consist exclusively of 
geometric designs of various sorts that must be dealt 
with according to various instructions. These include 
analogies, classification, and synthesis (fitting pieces 
to larger forms). Level H is the most difficult level of 
these tests and is said to be appropriate for Grade 12 
and above. To conserve time, two of the three Level H 
subtests were administered, those measuring classifi - 
cation and synthesis. A total of 55 items was in- 
volved. ` 

Test-retest reliability of the Nonverbal Battery is 
reported to be in a range from about .60 to .80. Corre- 
lations with the other two batteries, Verbal and Quan- 
titative, are in about the same range, indicating sub- 
stantial common variance, presumably due to a g-type 
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factor or Level II ability. Correlations with the Stan- 
ford-Binet Intelligence Scale range from .65 (Nonver- 
bal) to .78 (Verbal), showing a substantial degree of 
overlap, suggesting again heavy saturation of Level II 
or abstract reasoning ability. The Nonverbal Battery, 
as a result of its geometric items, is probably more cul- 
M fair than the other two batteries or the Stanford- 
inet. 

The multiple-choice test utilized was the final ex- 
amination in the introductory psychology class. Con- 
sisting of 40 items, it covered the entire textbook and 
tapped lecture material as well. 

Subjects. The subjects were 220 students from the 
introductory course. "They were asked to indicate racial 
membership and sex on a background form, and this 
information was checked against visual appearance 
when names were called off at the beginning of the test 
session (there was one contradiction). From this in- 
formation, four racial groups were identified: white 
(85), Asian (Japanese or Chinese, 42), black (20), and 
Mexican American (excluding those who did not spe- 
cifically indicate Mexican American or Chicano descent 
as opposed to those with a Spanish or South American 
origin, 31). These 178 subjects constitute the sample 
of this study. 


Results 


Reliability of the CAT has been discussed. 
It was not possible to establish reliability of 
the multiple-choice test. It will be shown in 
Study 2, however, that such tests do possess 
acceptable reliability. Reliability of the 
FDS was measured by examining the pattern 
of correct and incorrect answers, where a 
correct answer is defined as a complete digit 
series with no errors. Perfect reliability 
would be indicated by consecutive correct 
answers (hits) up to the first error (miss), 
followed by consecutive incorrect answers. 
Exceptions to this pattern would reflect ei- 
ther item unreliability or cheating. Of the 
178 subjects, 155 produced a perfect pattern 
of consecutive hits and misses, indicating 
perfect consistency for 87% of the sample. 

This percentage is inflated, however, due 
to the fact that uninterrupted hits to eight 
and nine digits could not possibly be fol- 
lowed by both a miss and a subsequent hit. 
Uninterrupted hits of eight or nine were 
therefore dropped from the analysis. Of the 
remaining 135 subjects, 113 (84%) showed a 
perfect pattern. This percentage can be 
expressed as a Pearson product-moment 
correlation coefficient by comparing the last 
correct response to the total number of cor- 
rect responses. Perfect item reliability 
would be indicated by identical values for 
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both members of the pair and, hence, a cor- 
relation coefficient of one. The obtained r 
is.92. Thus, once an error was made, it was 
unlikely that a longer digit span would be 
recalled correctly. "This indicates not only 
high reliability of the part-whole type but 
also a relatively low degree of cheating. 

"Turning to the correlations between the 
three tests, scores were first standardized 
across the four races. The mean racial dif- 
ferences were eliminated, setting each racial 
mean equal to the grand mean. With group 
differences thus eliminated, correlations 
between tests for the total sample could be 
calculated without inflation due to group 
differences. The correlations obtained are 
.35 (multiple choice and CAT), .17 (multiple 
choice and FDS), and .20 (CAT and FDS). 
All three correlations are significant at p < 
.05. Of main interest are the two correla- 
tions involving the multiple-choice test, 
which indicate the relative loading of Level 
I (FDS) and Level II (CAT) ability. These 
two correlations are significantly different, 
as evaluated by Hotelling's t ratio, t(175) 
= 2.02, p < .05. 

This pattern of correlations holds up in 
three of the four races and breaks down only 
in the smallest group. Proceeding in the 
same order as indicated above, the correla- 
tions for each race are (a) white: .35,.32, and 
-17; (b) black: .39, .18, and .49; (c) Asian: 
-82, .04, and .21; and (d) Mexican American: 
.34, —.04, and .06. 

"These correlations, coupled with Jensen's 
theory of racial-socioeconomic differences 
in Level I — Level II ability, lead to a specific 

prediction about the pattern of mean test 
scores that should be observed in this sam- 
ple. The black and Mexican American 
students at the University of Southern Cal- 
ifornia come from a lower socioeconomic 
background than do either white or Asian 
students, although Asians probably also 
score somewhat below whites. According to 
Jensen's theory, one should therefore find 
that black and Mexican American Level II 
test scores diverge more from white and 
Asian scores than do Level I test scores. The 
discrepancy should thus be relatively large 
on the multiple-choice and CAT tests as 
compared to the FDS test. 
Figure 1 presents mean test scores for each 
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Figurel. Mean test scores for four races on three tests: 


(a) a multiple-choice final examination (Mult. Ch.); (b) 
the Cognitive Abilities Test, Nonverbal Battery, Level 
H (CAT; Thorndike & Hagen, 1974); and (c) a forward: 
digit span test (FDS). (Ns are in parentheses. Mex. 
Am. — Mexican American.) 


race in terms of the combined within-groups 
SD. White and Asian students scored al- 
most the same on all three tests, while black 
and Mexican American students scored 
much lower on the multiple-choice test and 
CAT, that is, between —.5 and —1.0 SD. 
Analysis of variance of these scores, with 
race as a between-subjects factor and tests 
as a within-subjects factor, produced a sig- 
nificant race effect, F(3, 174) — 5.1, p € .01, 
and a significant interaction between race 
and test, F(6, 348) — 5.6, p €.01. Analysis 
of variance of each test separately indicated 
a significant race effect on both the multi- 
ple-choice test and CAT (with F > 5 and p 
< .01 in both cases) and a nonsignificant ef- 
fect on FDS (F <1). Scheffé post hoc tests 
of individual means showed that black and 
Mexican American students scored signifi- 
cantly lower than white and Asian students 
on the first two tests, and that blacks scored 
significantly lower than Mexican American 


students on these tests as well (p <.05 in all 
comparisons). 


Discussion 


Having anchored the multiple-choice test 
form to Level II ability, interrelationships 
and racial patterns with essay and true-false 
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test forms can now be investigated with some 
profit. We assume that essay tests tap Level 
II ability more than Level I ability, and we 
assume the reverse for true-false tests. This 
assumption can be tested by examining the 
correlations between these test scores and 
multiple-choice test scores, given that the 
latter scores reflect Level II ability more 
than Level I ability. It can then be deter- 
mined if the mean racial-socioeconomic 
differences predicted by Jensen's theory 
show up as clearly as they did in Study 1. 


Study 2 
Method 


Subjects. The sample in this study comes from the 
author's course in child development as taught in four 
consecutive semesters (fall 1975 through spring 1977). 
Complete data are available for 87 white students, 32 
black students, and 28 Asian students. Almost all of 
the black students were female (30/32), and therefore 
sex is held constant (female) in the main racial com- 
parisons, with data for males presented as an aside. 
Racial membership itself was determined nonreactively, 
with the teacher (the author) simply noting the race of 
minority students when calling out names to return 
material of one kind or another and marking the name 
accordingly. 

Tests. Each semester, essay, multiple-choice, and 
true-false tests were given, each over different content 
material. Most of the questions in all tests (about 8096) 
were based on material in the textbook Psychological 
Development of the Child (Longstreth, 1974). Mul- 
tiple-choice and true-false tests were answered on a 
separate sheet that was machine scored. Essay answers 
were scored blind by two readers working indepen- 
dently. Students were instructed to write their names 
only on the cover sheet of the examination. This sheet 
was turned over before answers were scored, after which 
the cover sheet was turned back, so that the score could 
be recorded against the correct name. 


Results 


All three tests were standardized across 
races, with mean = 50 and SD = 10. Thus, 
between-tests variance was eliminated, but 
between-races and Race X Test interaction 
variance was not changed. Reliability was 
investigated by different methods depending 
upon test form, since a constant number of 
each kind of test was not administered each 
semester. For essay tests, interrater reli- 
ability was determined. Multiple-choice 
reliability of the test-retest kind was deter- 
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Table 1 
Reliability of the Three Test Forms 


Multiple choice 


Essay Test- Odd- True-false 

Race N Interrater retest even Odd-even 
Black 30 87 59 .78 72 
Asian 18 44 67 .40* 52 
White 48 15 61 68 60 
Total* 96 17 62 68 60 


Note. Only females are represented in this table. Data are 
significant at the .05 level. 

* Data are adjusted for differences in group means. 

* p €.10. 


mined as well as odd-even reliability. 
True-false tests, like essay tests, were ad- 
ministered more than once in only two se- 
mesters, precluding test-retest reliability 
analysis because of a small N. Odd-even 
reliability was calculated from all true-false 
tests given to each student. These various 
correlations are presented in Table 1. 

Table 1 shows that all the correlations are 
significant except one. Interrater reliability 
on essay tests was surprisingly high for all 
races, perhaps because the author (Reader 
1) took pains to write out an articulate 
scoring key. Odd-even reliability on mul- 
tiple-choice and true-false tests was about 
equal for black and white samples and 
somewhat lower for Asians (due, in large 
part, to smaller variance). Total reliability 
coefficients were determined after adjusting 
individual scores to eliminate mean race 
differences. They indicate reasonable high 
reliability for all test forms and, in particular, 
comparable odd-even reliability for the two 
tests that were so evaluated, that is, multi- 
ple-choice and true-false tests. 

The critical assumption that multiple- 
choice and essay tests share more common 
variance for Level II ability than either one 
with true-false tests can be determined by 
comparing the correlations between test 
forms. For these comparisons and for the 
comparisons of means that follow, all tests 
of a given kind were combined within each 
semester to provide one score for each test 
form for each student. 

The correlations between test forms are 
presented in Table 2. A remarkable con- 
sistency is evident: Allraces show a higher 
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Table 2 
Correlation Coefficients Between the 


Test Forms 


Race N E-MC E-TF MC-TF 
Black 30 aut 49 46 
Asian 18 67 37 53 
White 48 63 -30 31 
Totale 96 .68 37 39 
Note. Only females are represented in this table. E-MC = 


essay versus multiple-choice tests, E~TF = essay versus true— 
false tests, and MC-TF = multiple-choice versus true-false 
tests. 

^ Data are adjusted for group mean differences. 


correlation between essay and multiple- 
choice tests than between either of those 
tests and true-false tests. The significance 
of the difference between the correlation 
coefficients for the total sample, with infla- 
tion due to mean race differences eliminated, 
was tested for significance by Hotelling's t 
ratio. The correlation between essay and 
multiple choice is significantly higher than 
either of the othér two correlations (p « .01 
in both cases). 

These correlations are differentially at- 
tenuated by the reliabilities of the individual 
tests, of course. Using the reliability coef- 
ficients from Table 1, corrected values for 
the combined sample were computed: For 
the essay versus multiple choice, r — .99; for 
the essay versus true-false, r = .54; and for 
the multiple choice versus true-false, r = .64. 
Thus, the obtained pattern of correlations is 
actually magnified when corrections for at- 
tenuation are made. 

We are thus in a position to predict that 
black and white mean test scores will deviate 

more on the essay and multiple-choice tests 
than on the true-false tests. Figure 2 pre- 
sents the mean scores in SD units. It can be 
seen that the prediction is clearly confirmed. 
Black females performed about .9 SD units 
below white females on essay and multiple- 
choice tests and less than .3 SD units lower 
on true-false tests. The interaction between 
race and test (females only) is significant, 
F(4,186) 224, p < .05, as is the main effect 
for race, F(2, 93) = 7.1,p <.01. Comparing 
the races for each test form separately 
yielded significant Fs for essay and multiple 
choice, F(2, 93) 7 8.1 and 82, p < .01,and a 
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nonsignificant F for true-false (F « 1). 
Scheffé post hoc comparisons between in- 
dividual means showed that black students 
scored significantly lower than white or 
Asian students on both essay and multiple- 
choice tests (p < .02 in all comparisons). 

The scores for white males are also note- 
worthy. They are significantly higher than 
scores for white females, F(1, 85) — 6.9, p « 
-02. We are inclined to attribute this sex 
difference to selection factors: Many of the 
males were “hard science" majors (biology, 
pre-med, pharmacy, etc.), while few females 
were. It is likely that the males were thus 
more highly motivated as well as perhaps 
more intelligent. Unfortunately, there were 
not enough males in the other two races to 
examine sex differences in a more complete 
fashion. 

The pattern of mean scores in Figure 2 is 
affected by the different reliabilities of the 
three tests. Expressed as they are in total 
SD units, the differences between means are 
a direct function of the reliabilities of the 
tests involved in any particular comparison. 
Thus, a mean difference of one SD ona test 
with perfect reliability differs only by Vr SD 
on a test with reliability r. When the means 
in Figure 2 are adjusted to take differential 
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Figure 2. E Mean test scores for three races on three 
tests administered in a child development course. (Ns 
are in parentheses. Mult. Ch. — multiple-choice 
test. M = male and F = female.) 
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Table 3 


Noncontent Essay Characteristics 
Gram- Total Words 
mar Legibility no. per 
Race errors (1+to7—) words sentence 
Black (N = 
22) 
M 2.7 3.7 212 22 
SD 2.8 1.3 74 6.3 
re 0 22 16 31 
Asian (N = 
14) 
M 1.6 2.9 233 23 
SD 1.5 16 63 5.6 
r —.69* 61* 09  —.05 
White (N = 
28) 
M 1.9 3.3 246 20 
SD 1.8 2.6 72 4.7 
r —.34 —.24 —20  —34 
"Total 
r —.35 01 02  —15 
Fb 1,46 219 1.43 173 


? Data are correlated with content score. 

b F ratio is based on race means. With df = 2,61, an F of 2.39 
is required for significance at p < .10. 

* p < 05. 


reliability into account, the pattern remains 
essentially the same. If anything, the gen- 
eral pattern is slightly exaggerated in the 
predicted directions. Specifically, the large 
black-white differences on multiple-choice 
and essay tests as compared to true-false 
tests are increased. 

‘As mentioned earlier, the essay scores may 
reflect racial differences in linguistic prac- 
tices as well as differences in Level II ability. 
Perhaps aspects of nonstandard English 
influenced the scorers, even though the overt 
purpose of the scorers was to assign scores 
according to meaning. To evaluate this 
possibility, one essay answer of each subject 
in three of the four semesters was examined 
(papers of students in the fourth semester 
were returned before it was decided to make 
this analysis and thus were not available). 
Four characteristics were measured: (a) 
number of mispellings, nonstandard ex- 
pressions, and bad grammar, (b) legibility 
(1+ to 7—), (c) total number of words, and 
(d) number of words per sentence. Means 


' and SDs for each characteristic are pre- 


sented in Table 3. Correlations with the 
content score for that essay question are also 


ss". presented. 


Two aspects of Table 3 stand out. First, 
the three races do not differ significantly on 
any of the four characteristics (F ratios are 
provided in the bottom row). This finding 
is consistent with the casual impressions of 
the scorers: "They did not feel they could 
identify a black versus a white answer on the 
basis of these characteristics alone. There 
were practically no nonstandard expressions 
that were associated with a single race: 
Most errors were in spelling not in unusual 
expressions. Second, the correlations be- 
tween these characteristics and the content 
score assigned to the answer are generally 
small, with only 2 of 12 reaching an accept- 
able level of significance. It is, therefore, 
obvious that these data can in no way ac- 
count for the large racial differences in con- 
tent scores. 


General Discussion 


These studies seem to indicate rather 
clearly that Jensen’s Level I — Level II theory 
has implications for an important real-life 
situation: performance on college exams. 
The first study showed that multiple-choice 
exams tap Level II ability more than Level 
Iability, and that low socioeconomic status 
black and Mexican American students score 
significantly lower on such tests than do 
middle socioeconomic status white and 
Asian students but do not differ from them 
on a measurement of Level I ability (FDS). 
The second study showed that essay scores 
correlated significantly higher with multi- 
ple-choice scores than did true-false scores 
and thus presumably are more highly satu- 
rated with Level IL ability. Mean scores on 
all three test forms showed that low socio- 
economic status black female students 
scored significantly below middle socioeco- 
nomic status Asian and white female stu- 
dents on both essay and multiple-choice 
tests but not on true-false tests. All these 
results are quite consistent with Jensen's 
theory. 

At the same time, there are some other 
questions that need to be asked. Of primary 
importance, perhaps, is the “different lan- 
guage" hypothesis, summarized by Cazden 
(1970) as follows: 
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Tt states that all children acquire language but that 
many children, especially lower-class black children, 
acquire a dialect of English so different in structural 
(grammatical) features that communication in school, 
both oral and written, is seriously impaired by that 
factor alone. (p. 36) 


According to this hypothesis, blacks have 
more difficulty than Asians or whites in 
comprehending and producing material 
presented to them in standard English. 
Applying this notion to the present results, 
it may be noted that the essay questions re- 
quired both comprehension and production, 
while the other test forms required only 
comprehension; for example, the true—false 
and multiple-choice forms did not require 
the student to “produce” an answer but 
simply to recognize the correct one from 
those already “produced” by the teacher. It 
follows that black students should have de- 
parted farthest from white norms on these 
essay questions due to the compound prob- 
lem of comprehension and production. 

This explanation, while accounting for the 
poorer performance of black students on the 
essay questions relative to true-false ques- 
tions, does not explicitly account for their 
equally poor performance on multiple-choice 
questions. Let us assume for the sake of 
argument, however, that more comprehen- 
sion ability is required to distinguish be- 
tween four competing statements than to 
decide if one statement is true or false. With 
this ad hoc assumption, the different lan- 
guage hypothesis accounts for both Scores 
reflecting racial differences, one difference 
in terms of production and comprehension 
deficiency (essay) and one difference in 
terms of comprehension deficiency (multiple 
choice). Let us consider these assumptions 
one at a time. 

The major problem with the production 
deficiency assumption is that there is no 
evidence of a production deficiency in the 
essay answers of the black students in this 
Study. Structural characteristics of their 
answers were generally indistinguishable 
from structural characteristics of Asian and 
white students. Furthermore, our indexes 
of structure were not significantly related to 
essay scores. Furthermore, we are not aware 
of any studies that report such a difference 
for the written answers of young adults. In 
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fact, what little evidence there is shows that . 
spoken usage of black dialect diminishes 
with age (Marwit & Marwit, 1976), leading 
some experts to conclude that it is unlikely 
that racial differences in adult production of 
standard English are detectable at all 
(Wolfram, 1971). Thus, so far as production 
deficiency is concerned, we find no compel- 
ling reason to believe that it accounts for the 
lower essay scores of the black students. We 
are convinced the lower scores reflect the » 
meaning of what was written not its form. 

Turning to the possibility of comprehen- 
sion deficiency with standard English, it 
appears as though the production deficiency 
of spoken standard English in young black 
children has frequently been taken to imply 
a comprehension deficiency as well. In fact, 
when comprehension is measured indepen- 
dently of production, racial-socioeconomic 
differences usually do not appear. In one 
recent study, for example, production defi- 
ciency was found, but comprehension defi- 
ciency was not found (Hall, Turner, & Rus- 
sell, 1973). Other studies finding no com- | 
prehension deficiency include those of Quay 
(1971, 1972, 1974) as well as Marwit, Marwit, 
and Boswell (1972). Another group of 
studies actually finds better comprehension 
of standard English than of black dialect by 
black children (Eisenberg, Berlin, Dill, & 
Sheldon, 1968; Weener, 1969). We are thus 
in agreement with the conclusion of Hall et 
al. (1973) that “it seems highly questionable 
that speaking the Negro nonstandard En- 
glish dialect hampers standard English 
comprehension” (p. 157). 

There is another group of studies that 
considers the comprehension hypothesis 
Specifically as it relates to multiple-choice 
tests (Bornstein & Chamberlain, 1970; Liv- 
ingston, 1973). The argument is straight- -s 
forward: Minority and low socioeconomic 
status students have difficulty answering 
multiple-choice items correctly because of 
the “verbal overload” of these items, that is, 
there 1s a greater verbal difficulty level than 
is required to state the problem. Accord- 
ingly, if items are simplified, there should be 
an Improvement of test scores, particularly 
for minority, low socioeconomic status stu- 
dents. Both studies tested this prediction 
by administering Standard items to one /^ 
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group and simplified items to another group. 
Neither study found a difference between 
groups even approaching statistical signifi- 
cance. The Bornstein and Chamberlain 
(1970) study is particularly relevant because 
the sample consisted of about two thirds of 
black students who scored at about the 
thirtieth percentile on national reading 
norms: In spite of the large N (over 800 in 
each group), the two versions of the test 
“yielded virtually no difference” (p. 603). 
Livingston (1973), after reviewing both 
studies, concluded that apparently “students 
who know enough of the content being tested 
to be able to answer a particular test item 
correctly can also read well enough to un- 
derstand the item” (p. 161). 

We would also call attention to Asian and 
Mexican American scores for the additional 
light they shed on these questions. There is 
little doubt that the Asian students came 
from a lower socioeconomic status back- 
ground than the white students. At the 
same time, they probably did not come from 
a background lower in parental push toward 
academic achievement. Conversations with 
many of these students suggest that, if any- 
thing, they were under more pressure to 
succeed. The fact that their scores are 
comparable to the scores of the white stu- 
dents in both studies suggests that back- 
ground conditions of some kinds (e.g., pa- 
rental pressure) may be much more impor- 
tant than background conditions of other 
kinds (e.g., income or education). 

Scores of the Mexican American sample 
are consistent with this point of view. Our 
best guess is that these students came from 
a socioeconomic status background equal or 
lower than that of black students. They also 
are exposed to what would seem to be a se- 
rious linguistic handicap: Many of their 
parents speak only Spanish. Yet, these 
students scored significantly higher than 
black students on both the CAT and a final 
multiple-choice examination (see Study 1). 
The usual socioeconomic status indexes do 
not account for this difference. In this case, 
we have no idea as to what the critical envi- 
ronmental conditions might be, nor have 
genetic factors been ruled out. These are 
questions that take on added importance, 
however, now that the real-life significance 
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of Level I— Level II differences begin to 
come to light. 
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Encoding of Nonverbal Behavior by High-Achieving 


and Low-Achieving Children 
Vernon L. Allen and Michael L. Atkinson 
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Adults viewed silent videotapes of high- and low-achieving children and esti- 
mated the level of understanding revealed by each child. The stimulus chil- 
dren (fourth and fifth graders) had earlier encoded understanding or not un- 
derstanding of a lesson by employing only nonverbal responses. The encoding 
was either spontaneous or deliberate (role play). Results showed that observ- 
ers accurately differentiated between understanding and not understanding 
in both the spontaneous and deliberate conditions. Furthermore, in the spon- 
taneous conditions, high achievers were perceived as understanding more than 
low achievers in both easy and difficult lessons, and females were perceived as 
understanding more overall than males. Achievement level and sex did not, 


however, yield a main effect in the deliberate encoding conditions. 


Successful enactment of the role of 
teacher requires that an individual be very 
sensitive and responsive to the behavior of 
the students. Among the more powerful and 
subtle cues that influence one’s behavior in 
any social interaction are the nonverbal re- 
sponses of the other persons (Argyle & 
Kendon, 1967; Ellsworth & Carlsmith, 1973; 
Le Compte & Rosenfeld, 1971). Nonverbal 
behavior is extremely important both in 
teacher-student interactions in regular 
classroom settings (Grant & Hennings, 1971) 
and in special tutoring arrangements in 
which an older child teaches a younger child 
(Allen, 1976). In addition to serving as a 
discriminative or directional cue, nonverbal 
responses from students also provide posi- 
tive or negative reinforcement for perfor- 

mance of the teacher role. A recent study 
demonstrated that a child's nonverbal re- 
sponses not only influenced the adult 
teacher's own nonverbal behavior but, more 
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importantly, also significantly affected the 
teacher’s evaluation of the child’s intellectual 
and social abilities (Bates, 1976). 

Not all students have the ability or the 
inclination to use nonverbal behavior effec- 
tively as a means of communicating in the 
classroom. It is reasonable to expect that 
students differ widely in the accuracy with 
which they can convey their degree of com- 
prehension of a lesson by the use of nonver- 
bal responses alone. One characteristic of 
students that may be related to nonverbal 
behavior in a learning situation is achieve- 
ment level. We predict that the accuracy of 
encoding comprehension and noncompre- 
hension by nonverbal behavior will be 
greater for high- than for low-achieving. 
Students. Although empirical studies 
dealing with differences in nonverbal be- 
havior as a function of students' achievement 
level are lacking, various other behaviors 
have been found to correlate with academic 
achievement (Cobb, 1972; Soli & Devine, 
1976). The present study explores the 
possibility that the behavior of high and low 
achievers differs in a more subtle way, — 
namely, in the type of nonverbal behavior - 
that they exhibit in a learning situation. 

It is hypothesized that high and low 
achievers employ nonverbal cues differently | 
when they are encoding understanding and ` 
not understanding. Specifically, we expect | 
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that low-achieving children will manifest less 
discriminability or differentiation between 
the nonverbal cues indicative of high and low 
comprehension of a lesson than in the case 
of high-achieving children. Two reasons can 
be advanced for this prediction. First, low 
achievers may be, in general, simply less re- 
sponsive or expressive nonverbally than high 
achievers. Some available data are consis- 
tent with this possibility. It is well known 
that a greater proportion of males than fe- 
males are identified as low achievers, and 
males are less expressive (i.e., less accurate) 
than females in communicating their emo- 
tional state by nonverbal behavior (Buck, 
1975). Perhaps all low achievers are defi- 
cient in the ability to encode by nonverbal 
responses; if so, their nonverbal behavior 
would remain relatively constant regardless 
of their true cognitive state. That is, when 
low-achieving children are experiencing 
contrasting cognitive states (i.e., under- 
standing and not understanding), much less 
differentiation in their nonverbal responses 
would be evident than for high-achieving 
children. 

A second reason for expecting a difference 
between high and low achievers is based on 
the consequences of learning to cope with 
stressful learning situations in school. Low 
achievers may have learned from personal 
experience that the encoding of nondis- 
tinguishable nonverbal cues is a safe strategy 
to use when facing a learning situation. For 
the low-achieving students, revealing their 
true cognitive state (understanding or not 
understanding) has probably been associ- 
ated in the past with failure and embar- 
rassment or has resulted in a greater degree 
of involvement and responsibility than they 
desired. "Therefore, a pattern of rather 
uniform and nondiscriminable nonverbal 
behavior that effectively conceals one's true 
level of understanding may have been 
adopted as a safe tactic to use in classroom 
interactions with the teacher. Used at first 
intentionally, this strategem could later 
eventually evolve into a habitual (and almost 
automatic) style of behaving in all learning 
situations. 

If a difference is found in the spontaneous 

jeaonverbal behavior displayed by high and 
low achievers, it would be susceptible to in- 
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terpretation either in terms of ability deficit 
or social tactic. Results can be clarified 
considerably, however, by obtaining data on 
deliberate (role play) behavior as well. If 
any difference found between the nonverbal 
behavior of the high and low achievers were 
also demonstrated under the deliberate 
condition (role play), it would suggest that 
low achievers are lacking in encoding ability 
(deficit explanation). On the other hand, if 
any obtained difference between high and 
low achievers in spontaneous behavior were 
nullified in the role play condition, it would 
suggest that low achievers use nondiscrimi- 
nable nonverbal behavior as an interpersonal 
strategy in learning situations (social tac- 
tic). 

The present study also investigated the 
effect of sex of the children. Previous re- 
search has found that females tend to be 
more expressive than males in their non- 
verbal responses to stimuli (Buck, 1975). As 
a consequence, decoding of nonverbal be- 
havior by other persons is more accurate 
when communicated by females than by 
males. We predict that the difference in 
perceived understanding between difficult 
and easy lessons will be greater for females 
than for males. As a caveat, it should be 
pointed out, however, that almost all previ- 
ous studies have used role play only to in- 
vestigate sex differences in the encoding of 
nonverbal behavior. (That is, subjects are 
instructed to convey intentionally a specified 
emotion to a target person.) It is possible 
that sex differences in nonverbal encoding 
would be less pronounced with spontaneous 
than with role play behavior. 

To test the predictions discussed above, 
a study was designed in which observers 
viewed a series of videotapes of children si- 
lently listening to a lesson. In each segment, 
a child was shown actually listening to an 
easy or difficult lesson (i.e., understanding 
or not understanding) or only pretending 
(role playing) to be understanding or not 
understanding a lesson. The observers 
made judgments about the amount of un- 
derstanding of each child. These judgments 
can be accepted as a measure of the chil- 
dren's success in using nonverbal behavior 
to communicate their level of comprehension 
to other persons. 
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Subjects 


Forty observers (20 males and 20 females) were re- 
cruited from introductory psychology courses at the 
University of Wisconsin-Madison. They received 
course credit for their participation. 


Preparation of Stimulus Tapes 


Fourth- and fifth-grade students from a suburban 
parochial school were recruited by letters sent to par- 
ents. Teachers categorized the students in terms of 
their overall school perfromance as achieving below 
grade level, at grade level, or above grade level. Since 
the school was relatively small, more than one teacher 
was often familiar with a student; many of the classifi- 
cations represented the consensus of two or more of the 
four classroom teachers. From the total group of 75 
students, 20 (12 girls and 8 boys) were selected as high 
achievers and 20 (7 girls and 13 boys) as low achiev- 
ers.! 

These 40 children were videotaped as they watched 

a lesson on electricity shown on a television monitor. 
The first half of the lesson was very easy (e.g., electricity 
was compared to water flowing in a river); the second. 
portion was quite difficult (e.g., the teacher presented 
formulae for calculating wattage in both series and 
parallel circuits). Thus, a record of the children's 
spontaneous nonverbal responses to both easy and 
difficult lessons was obtained. 
The children were then told that the experimenter 
wanted to see if another person could determine 
whether a student was understanding the lesson or not. 
The experimenter explained that another teacher would 
watch as they listened to the lesson again. The children 
were asked to try to “trick” this teacher—first to make 
her think that they understood the lesson completely 
while actually listening to the difficult lesson and (ata 
signal from the experimenter) also to make the teacher 
think that they did not understand at all while actually 
listening to the easy lesson. A videotape record was also 
made of the children’s nonverbal responses during this 
deliberate (role play) encoding session, 

The stimulus tapes were prepared by randomly se- 
lecting 16 children: 8 high achievers (4 males and 4 
females) and 8 low achievers (4 males and 4 females). 
The decision to choose 16 stimuli was dictated by the 
quality of the available tapes and by the length of time 
required for subjects to view a stimulus tape. Each of 
the 16 stimulus persons appeared in a random order on 
each one of the four stimulus tapes, with the following 
restrictions: (a) Each stimulus person displayed a 
different type of nonverbal behavior (spontaneous or 
deliberate, understand or not understand) on each 
stimulus tape; (b) on a given stimulus tape, the same 
type of nonverbal behavior did not occur more than 
twice in succession. The appropriate 2.5-minute por- 
tion of videotape from the recording was then edited 
and re-recorded, producing four stimulus tapes. Each 
tape contained 16 2.5-minute segments of nonverbal 
behavior; each segment showed either a male or female 
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who was either a high or low achiever. A10-secsegment » 
of blank videotape was inserted between each stimulus w 
person to allow observers time to record their judg- 
ments. | 


Apparatus | 


Two Panasonic MV 3085 videotape recorders were | 
used to construct the four stimulus tapes; one recorder 
was employed to play the stimulus tapes to observers, | 
Observers viewed the stimulus tapes on an RCA 66-cm 
Lyceum studio monitor. Seats in the viewing room 
were arranged so that each observer had a separate 4, 
working space with an unobstructed view of the video 
monitor. 


Procedure 


The experimenter explained that subjects would see 
a videotape of 16 fourth- and fifth-grade students lis- 
tening to either a difficult or an easy lesson. They were 
told that the experiment concerned the inferences that 
people make about others solely on the basis of overt 
behavior and, therefore, that they would view the vid- * 
eotape without any sound. During a 10-sec interval 
after each segment of videotape, the observers re- 
sponded to the following questions about the child by 
using 10-point bipolar scales: (a) “How well would you 
estimate this child understood the lesson?” (“didn’t 
understand at all” to “understood completely”); (b) 
“How much did the child like the lesson that he (she) 
was taught?” (“disliked very much” to “liked very 
much”); (c) “What would you estimate to be the general 
intellectual ability of this child?” (“very low” to “very 
high”). Each set of scales was printed on a separate 
page of the answer booklet. Observers were not aware » 
that the stimulus persons they were observing were 
sometimes role playing. The entire procedure lasted 
for approximately 1 hour, after which subjects were 
thoroughly debriefed. 


Method of Analysis 


The data were analyzed separately for spontaneous 
and deliberate behavior. In each analysis, a 2 X 2 X2 
X 2 mixed factorial design was used. The between- 
subjects factor was sex of subject (male/female). 
Within-subjects factors were (a) type of lesson encoded 
by the stimulus children (easy/difficult), (b) achieve- 
ment level of the stimulus children (high/low), and (c) 
sex of the stimulus children (male/female). 


Results 


The major dependent variable in the 
present study is the observers’ estimate of 
understanding (comprehension) shown by 


1 Appreciation is expressed to David J, Miller and. 
Joseph G. Plazewski for obtaining the videotape re” 
cordings. 
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"Table 1 
Analysis of Variance for Estimated Understanding (Spontaneous and Deliberate Encoding) 
Spontaneous Deliberate 
Source df MS F MS F 

Sex of observer (A) 1 17.58 1.26 01 
Error 38 13.90 5.45 

Type of lesson (B) 1 19.50 7.38* 1,018.88 138.17*** 

AXB 1 3.00 1.14 14.03 1.90 
Error 38 2.65 7.37 

Achievement level (C) v 53.62 9.47** 3.40 91 

AXC 1 .03 .90 24 
Error 38 5.66 3.76 

BxC 1 1.38 1.13 19.50 3.51 

AXBXC 1 4.28 3.52 25 .05 
Error 38 1.22 5.56 

Sex of stimulus person (D) 1 47,28 27.88*** 5.25 2.82 

AXD 1 5 09 -10 38 
Error 38 1.70 1.86 

BxD 1 8.12 1.84 2.28 37 

AXBXD 1 1.65 37 08 01 
Error 38 4.42 6.19 

CXxD 1 9.45 4.26* 3.00 .92 

AXCXD T 1.13 51 25 .08 
Error 38 2.29 3.27 

BXCXD 1 11.63 4.59* 29.40 9/715. 

AXBXCXD 1 .90 .86 .90 .30 
Error 38 2.54 3.01 
* p € .05. 

** p € 01. 
*** p < 001. 


the stimulus children. Supplementary data 
were collected from observers concerning the 
stimulus persons' intellectual ability and 
liking of the session. The comprehension 
measure was strongly correlated with both 
liking (r = .66, p < .001) and intellectual 
ability (r = .68, p < .001). In view of the 
strong correlation with understanding, data 
for the two supplementary measures will not 
be presented separately. 

A one-way analysis of variance was ini- 
tially conducted on the understanding data 
to determine whether any differences existed 
across the four stimulus tapes. Results in- 
dicated that observers did not judge the 
tapes differentially on the understanding 
measure, F(3, 36) = .21, ns. Consequently, 
the data were collapsed across the stimulus 
tapes for the analysis reported below. 


Spontaneous Behavior 


The nonverbal behavior that children had 
encoded spontaneously was subjected to a 2 


Xx 2X 2X 2 analysis of variance. Results of 
the analysis of variance on the estimated 
understanding data can be seen in the first 
half of Table 1. The analysis yielded a main 
effect for type of lesson (p < .05), indicating 
that the stimulus persons were rated as un- 
derstanding more while listening to the easy 
(M = 5.96) than to the difficult lesson (M = 
5.46). The significant main effect for 
achievement level of the stimulus persons (p 
< .01) revealed that high achievers were 
judged as understanding more than low 
achievers (6.13 vs. 5.30, respectively). And 
female stimulus persons (M = 6.09) were 
rated as understanding more than males (M 
= 5.32), as evidenced in the main effect of sex 
of the stimulus persons (p < .001). 

Two interactions emerged from the anal- 
ysis. First, the interaction between 
achievement level and sex of stimulus person 
(p < .05) indicated that female high achie- 
vers were estimated as understanding more 
than female low achievers, but there was no 
difference between male high and low 
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Table 2 : è 
Mean Estimated Understanding as a Function 
of Achievement Level and Sex of Stimulus 


Persons (Spontaneous Encoding) 


Achievement level 


Srnu High Low 
Male 5.56 5.08 
Female 6.67 5.52 


Note. Higher numbers indicate more understanding. 


achievers. These results are presented in 
Table 2. The second significant interaction 
involved type of lesson, achievement level, 
and sex of stimulus person (p < .05). As 
shown in Table 3, stimulus persons were 
rated as understanding more while listening 
to the easy than to the difficult lesson, with 
the single exception of the low-achieving 
males. The largest difference between the 
easy and difficult conditions was found for 
the low-achieving females. 

We predicted an interaction between 
achievement level and type of lesson. That 
is, we expected that the difference between 
estimated understanding in the easy and 
difficult lesson conditions would be greater 
for high than for low achievers. This inter- 
action did not occur in the present analysis, 
but the second-order interaction described 
in Table 3 enables us to examine the pre- 
diction separately for males and females. 
Inspection of the difference between the 
means of the easy and difficult lessons 
(shown in Table 3) indicates that results 
were in the predicted direction only for 
males, where the difference between means 


Table 3 

Mean Estimated Understanding as a Function 
of Type of Lesson, Achievement Level, and 
Sex of Stimulus Persons (Spontaneous 


Encoding) 
Type of lesson 
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was greater for high than for low achievers 
(.43 vs. —.07). But for females, the results 
were contrary to predictions: The mean 
difference between the easy and difficult 
lessons was smaller for high than for low 
achievers (.30 vs. 1.33). 


Deliberate Behavior 


Shown in the second half of Table 1 are 
the results of a 2 X 2 X 2 X 2 analysis of. 
variance conducted on the observers’ esti- 
mated understanding for the deliberate or 
role-played behavior. Consistent with re- 
sults for spontaneous behavior, a main effect 
for type of lesson (easy/difficult) was found. 
for deliberate behavior (p « .001). Inspec- 
tion of the means indicates that stimulus 
persons were rated as understanding more 
while pretending to listen to an easy lesson: 
than to a difficult one (6.78 vs. 3.21). In 
addition, a significant second-order inter- 
action emerged that involved difficulty of 
lesson, achievement level, and sex of stimu- 
lus person (p < .005). The means for this _ 
interaction are presented in Table 4. As can 
be seen, in all conditions, the stimulus per- 
Sons were rated as understanding signifi- 
cantly more while deliberately encoding 
comprehension than while encoding non- 
comprehension. Thus, when asked to en- 
code comprehension and noncomprehen- 
Sion, all stimulus persons were able to do so 
successfully. Table 4 also reveals that the 
difference in perceived understanding be- 
tween the easy and difficult lessons varied as 
a function of achievement level and sex of 
the stimulus person. For high achievers, the 


Table 4 


Mean Estimated Understanding as a Function 
of Type of Lesson, Achievement Level, and 
Sex of Stimulus Persons (Deliberate 
Encoding) 


—Type of lesson _ 

Stimulus person Easy Difficult Stimulus person Easy Difficult 
High achievers High achievers 

Male 5.78 5.35 Male 6. 

Female 6.82 6.52 Female aie Hee 
Low achievers Low achievers i 

Male 5.05 5.12 Male 6.68 

Female 6.18 4.85 Female 718 Ju 


Note, Higher numbers indicate more understanding. 


Note. Higher numbers indicate more understanding. 
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Table 5 

Mean Estimated Understanding as a Function 

` of Type of Lesson in the Spontaneous and 
Deliberate Encoding Conditions 


Type of lesson 


Type of encoding Easy Difficult 
Spontaneous 5.96 5.46 
Deliberate 6.78 3.21 


Note. Higher numbers indicate more understanding. 


difference between easy and difficult con- 
ditions was greater for males than for fe- 
males; for the low achievers, results were in 
the opposite direction. No other interac- 
tions emerged from the analysis. 


Comparison of Spontaneous and 
Deliberate Behavior 


An additional analysis was undertaken to 
compare directly the spontaneous behavior 
of stimulus persons with their deliberate 
(role-played) behavior. Data relevant to 
this comparison are presented in Table 5. 
As can be seen from the table, the difference 
between means in the estimated under- 
standing in the easy and difficult lessons was 
much smaller for spontaneous (.50) than for 
deliberate (3.57) encoding. To assess the 
significance of this difference, an analysis of 
variance was conducted. Results of the 
analysis revealed a significant interaction 
between type of lesson and type of encoding, 
reflecting the greater differentiation in un- 
derstanding when the encoding by stimulus 
persons was deliberate as compared to 
spontaneous, F(1, 38) = 71.46, p < .001. 
The absolute scale values of the means 
shown in Table 5 are also of interest. For 
spontaneous behavior, the means for un- 
derstanding the easy and difficult lessons 
were near the neutral or midpoint of the 
10-point scale of understanding. By con- 
trast, the understanding scores for deliberate 
behavior were displaced toward the end- 
points of the scale. 


Discussion 


In the present study, we examined the 


"* ability of children to communicate accu- 
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rately their comprehension and noncom- 
prehension of a lesson by the use of nonver- 
bal cues alone. Results revealed that ob- 
servers could discriminate accurately be- 
tween different levels of understanding when 
children were listening to easy and difficult 
lessons. Greater understanding was re- 
ported for children in the easy than in the 
difficult condition. It is clear, therefore, 
that children unintentionally disclose their 
degree of comprehension by nonverbal re- 
sponses even when merely listening passively 
toalesson. Since perceived comprehension 
was strongly correlated with observers’ at- 
tributions of intelligence, it is obvious that 
children’s nonverbal behavior in the class- 
room can have very important implica- 
tions. 

We hypothesized that low achievers would 
show less nonverbal differentiation between 
the easy and difficult lessons than would 
high achievers. Support was found for this 
hypothesis for males but not for females. As 
predicted, low-achieving males did not dif- 
ferentiate between the easy and difficult 
lessons, but high-achieving males did. 

The clearest and strongest finding con- 
cerning the achievement factor was, how- 
ever, a great deal simpler than the formula- 
tion offered by our hypothesis. The ob- 
servers perceived that high achievers un- 
derstood more than low achievers in both the 
easy and difficult conditions—and this 
pattern of results was even stronger for fe- 
males than for males. This finding cannot 
be dismissed as being an artifact due to a real 


` difference in understanding of the lessons by 


the high- and low-achieving children. The 
lessons were especially chosen after consid- 
erable pilot work had indicated that the easy 
lesson was readily understandable and the 
difficult lesson was definitely not under- 
standable by children of the age used in this 
experiment. Furthermore, the children in 
the experiment reported their level of un- 
derstanding on a scale after listening to the 
two lessons. Results for this self-report 
measure showed that the mean under- 
standing scores of the high- and low- 
achieving children were very similar and did 
not differ significantly. 

Tt can be concluded, then, that in learning 
situations, a higher level of understanding is 
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conveyed by the nonverbal responses given 
by high- than by low-achieving children— 
even in the absence of any difference in their 
actual understanding. One possible expla- 
nation for this finding is based on the past 
experience of high and low achievers in 
learning situations. Because of their fre- 
quent success in the past, the high achievers 
may approach a new lesson with confidence 
and with the strong expectation of their 
being able to understand it. Even when the 
lesson is difficult and not really understood, 
the high achiever might still believe that he 
or she has attained some understanding. 
This pervasive cognitive set of expecting to 
understand could be unintentionally con- 
veyed by the nonverbal cues emitted by high 
achievers. Conversely, persons with a his- 
tory of poor academic performance might 
approach a new learning situation with the 
expectation of not being able to understand, 
which would also be revealed by their non- 
verbal responses. This hypothesis could 
easily be tested. Distinctive expectations 
could be established by experimentally 
creating success or failure experiences. It 
would be predicted that in a subsequent 
learning situation, these expectations would 
result in different patterns of nonverbal re- 
sponses. 

Results also showed that females were 
perceived as understanding more than 
males. This finding may simply be another 
instance of a well-known halo effect: People 
believe that physical attractiveness is asso- 
ciated with intelligence and other socially 
desirable traits (Dion, Berscheid, & Walster, 
1972). Perhaps perceived understanding is 
correlated strongly with physical attrac- 
tiveness. If so, it could be argued that the 

obtained sex difference is due to differential 
attractiveness of the boys and girls and not 
to a difference in their encoding of nonverbal 
behavior. If girls are considered to be more 
attractive in general than boys, the sex dif- 
ference inperceived understanding could be 
accounted for by the physical attractiveness 
variable. The same explanation might be 
applicable as well to the finding of greater 
understanding for high- as opposed to low- 
achieving children. Thus, the greater per- 
ceived understanding of high-achieving 
children might simply be due to their being 
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more attractive than low-achieving children 5 
rather than to a difference in their nonverba; ^ 
encoding of understanding and not under- | 
standing. | 

The physical attractiveness interpretation | 
of the sex and achievement findings can be | 
tested by correlating ratings of under- ! 
standing with an independent measure of | 
the children’s physical attractiveness. Ac. | 
cordingly, we prepared stationary videotape | | 
pictures of all stimulus children and pre- ^» 
sented them to 10 judges for ratings of 
physical attractiveness. Physical attrac- 
tiveness scores of the children were corre- | 
lated with perceived understanding scores 
in easy and difficult lesson conditions. The | 
correlation coefficients did not differ sig- 
nificantly from zero. "Therefore, the at- 
tractiveness interpretation cannot account. | 
for the difference in comprehension obtained 4, 
as a function of sex and achievement level of 
the children. 

Data for the children's role playing of | 
understanding and not understanding help 
to clarify the findings obtained from their 
Spontaneously encoded behavior. The most 
notable finding for the deliberate (role play) 
behavior was the consistency with which the 
children successfully encoded understanding 
and.nonunderstanding. Across all catego- | 
ries of sex and achievement level, the chil- 
dren successfully communicated the dis- 
tinction between understanding and not 
understanding by their nonverbal behavior. 
It appears that the children held similar role 
expectations concerning the nonverbal be- 

aviors that comprise different levels of 
understanding, and that they were also able 
to perform the appropriate set of overt be- | 
haviors upon request. Data for the delib- 
erate behavior indicate clearly that results 
from the spontaneous condition for sex and 
achievement level were not due to differ- 
ences in the children's role expectations or 
in their ability to perform the appropriate 


The results for deliberate behavior were 
much simpler than for spontaneous behav- 
lor, that is, fewer main effects and interac- 
tions were obtained. Spontaneous behavior 
was more sensitive to the personal charac- 
teristics of the children Such as sex and 
achievement level, The simpler pattern of 
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%° results for deliberate, as compared to spon- 


. taneous, behavior is consistent with findings 
from research on simulation or role playing 
in psychological experiments. When per- 
sons are asked to simulate being naive 
subjects in psychological experiments, they 
frequently can reproduce the simple results 
actually obtained in the experiments; but the 
more complex and subtle results (such as 
interactions among variables) are not dup- 
licated (Miller, 1972). 
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Interaction of Individual Differences with Visual 
and Verbal Elaboration Instructions 


Harold D. Delaney 


University of New Mexico 


The current study investigated (a) to what extent the typical effects of elabo- 
ration instructions and the imagery characteristics of verbal material would 
obtain for foreign-language-English word pairs and (b) to what extent individ- 
ual differences would moderate these effects. To explore these issues, verbal 
fluency, visualization ability, and instructions were factorially varied between 
subjects, and the imagery characteristics of the response terms were varied 
within subjects. Forty-eight university students were selected for participa- 
tion in the experiment on the basis of their performance on psychometric 
tests. The result of most interest was the crossover interaction between ver- 
bal fluency and elaboration instructions. Results were discussed in terms of 
Aptitude X Treatment interactions and the importance of individual differ- 
ences in decisions regarding instructional methods. 


Paivio (1969; see also DiVesta, Ingersoll, 
& Sunshine, 1971) has suggested that imag- 
ery could be studied by manipulating stim- 
ulus attributes, by varying instructions in 
learning strategies, or by selecting subjects 
according to their imaging ability. Research 
to date has concentrated on the first two 
variables. For example, many recent studies 
of factors determining the rate of human 
learning have illustrated the importance of 
the instructions given to subjects (e.g., Bu- 
gelski, 1974; Levin, Davidson, Wolff, & Cit- 
ron, 1973; Morris & Stevens, 1974; Rowe & 
Cake, 1974) and of the imagery characteris- 
tics of the material to be learned (e.g., 
Montague & Carter, 1973; Paivio, 1971). 
However, the effects of these variables have 
typically not been investigated in conjunc- 
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tion with individual differences among 
subjects, one notable exception being the 
work by Rohwer and his associates on the 
possible interaction of elaboration instruc- 
tions with age (Rohwer, 1970) and race 
(Rohwer & Ammon, 1971). In fact, little 
work has been done in verbal learning in the 
area of individual differences, even though 
-the crucial role of individual differences in 
influencing performance in other cognitive 
tasks has been amply demonstrated by Hunt 
and others (e.g., Hunt, Frost, & Lunneborg, 
1973; Hunt, Lunneborg, & Lewis, 1975). 
The relatively small, early literature on in- 
dividual differences in verbal learning has 
been reviewed by Jenkins (1967) and, from 
the point of view of an imagery theorist, by 
Paivio (1971, pp. 477—524). 

Certain recent studies in learning have 
explored the possibility of an interaction of 
individual differences with either type of 
stimulus material or type of mediation in- 
Structions. Stewart (1965; see also Paivio, 
1971, p. 206ff) varied both the imagery level 
of subjects as measured by psychometric 
tests and the image-evoking characteristics 
of the material to be learned. Ina paired- 
associate study, Stewart found that lists 
made up of word pairs were learned in fewer 
trials by low-imagery-ability subjects than 
by high-imagery-ability subjects. However, 
with pictures serving as both stimuli and 
responses, the reverse was true, that is, the 
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_ high-imagery-ability subjects learned the 
picture lists in fewer trials. Stewart inter- 


preted her results as though imagery and 
verbal abilities represented opposite ends of 
a bipolar dimension. 

Levin, Divine-Hawkins, Kerst, and 
Guttman (1974) also utilized picture 
paired-associate lists and word paired-as- 
sociate lists but did so as a pretest of 
subjects’ ability to use visual imagery. The 
pattern of abilities demonstrated on these 
paired-associate tasks was predictive of the 
effectiveness of instructions to use a visual 
imagery strategy in learning from prose. In 
comparison to a group that had difficulty 
learning the picture pairs, subjects who 
learned the pictures quickly demonstrated 
significantly better recall performance for 
the content of a reading passage when both 
groups were told to “make up pictures in 
their minds" about the story. 

Taking a somewhat different approach, 
DiVesta and Sunshine (1974) attempted to 
manipulate the processing used by subjects 
primarily by the instructions they gave 
subjects rather than by the types of stimulus 
materials used. They demonstrated a reli- 
able interaction between subjects' imagery 
ability as assessed by psychometric tests and 
instructions to use different types of elabo- 
rative strategies in a verbal learning task. 
The specific task used was one which re- 
quired subjects to learn a list of words by 
associating them with the items named in a 
previously memorized mnemonic jingle 
(namely, “one is a bun,” “two is a shoe,” and 
soon). High-imagery-ability subjects made 
more correct responses when using an ima- 
ginal mediation strategy than when using a 
verbal mediation strategy, whereas the re- 
verse was true for low-imagery subjects. 
DiVesta and Sunshine (1974) did not report 
tests involving verbal ability. However, in 
an unpublished paper, these same authors 
(DiVesta & Sunshine, Note 1) discussed a 
similar study in which subjects were classi- 
fied post hoc as high or low verbalizers ac- 
cording to their Verbal scores on the Scho- 
lastic Aptitude Test. Because of the ab- 
sence of two-way interactions involving 
verbal ability, DiVesta and Sunshine con- 
cluded, “These data imply that imagery 
ability is generally more related to the use of 
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the strategies represented by the mediators 
and mnemonics employed in this study than 
is verbal ability" (Note 1, p. 15). 

One of the major purposes of the current 
study was to provide a more thorough test of 
the role individual differences in verbal 
ability play in determining the extent to 
which subjects can benefit from instructions 
in mnemonic strategies. Another purpose 
was to investigate the generality of the in- 
teraction reported in certain situations be- 
tween imagery ability and various task 
variables. 

Thus, the study reported here differs from 
other work in several respects. Certain 
features were implemented in the present 
design in order to more closely approximate 
a realistic educational situation in which 
these mnemonic strategies might be applied. 
For example, as foreign language vocabulary 
learning is one of the most frequently cited 
examples of paired-associate learning, for- 
eign language words were used as stimuli in 
the current study. As another example, 
DiVesta and Sunshine (1974), following 
Paivio and Foth (1970), required subjects to 
actually construct physical drawings or write 
out sentences to make explicit the mediating 
link they were using to connect the mne- 
monic cue and the response. While there 
are some educational situations in which 
such a procedure would be feasible, a strat- 
egy that did not require the additional ma- 
terials and time needed for overt production 
would seem to be much more widely appli- 
cable. A further complication of the use of 
overt production methods as pointed out by 
Pressley (1976) is that the time subjects 
spend on a given item is not well controlled. 
Thus, in the DiVesta and Sunshine study, 
time per item was confounded with media- 
tion mode, in that the mean time per item 
was greater for subjects who were drawing 
pictures than for subjects who were writing 
sentences (p < .001). Inthe current study, 
this was eliminated by allowing subjects to 
spend only a standard amount of time on 
each item. 

The central hypothesis of the current 
study was that instructions in a mnemonic 
strategy would interact with measures of the 
principal ability involved in the application 
of that strategy. Thus, subjects who are 
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high in a particular verbal ability should 
perform better when given instructions to 
use a learning strategy that depends on that 
ability rather than some strategy that only 
minimally relies on that ability. An analo- 
gous result was expected for subjects high in 
imaging or visualization ability. It was also 
anticipated that the typical facilitative ef- 
fects of using concrete as opposed to abstract 
materials would obtain, even though this was 
being manipulated in conjunction with for- 
eign language terms and on the response 
side, where rated imagery normally has less 
effect. One might think intuitively that vi- 
sualization ability would interact with re- 
sponse imagery, with high-visualization- 
ability subjects being able to capitalize more 
than subjects low in this ability on the 
properties of high-imagery words. However, 
Paivio and Yuille (1967) failed to support 
such a prediction in a situation similar to the 
current experiment. Thus, no definite 
predictions were made with respect to this 
interaction. Finally, it was expected that 
relative to a control group that was not in- 
structed to employ any particular strategy, 
specific instructions to elaborate on the 
presented material would facilitate the 
learning of a foreign language vocabulary 
(Raugh & Atkinson, 1975) as it does the 
learning of other verbal materials. 
To test these hypotheses, a multifactor 
paired-associate learning experiment was 
performed, with one factor pertaining to 
subject abilities, one to instructions, and one 
to type of material to be learned, Individual 
differences in verbal fluency (VF) and visu- 
alization/spatial ability (VS), which were the 
abilities thought to be most relevant to per- 
formance in the experimental task, were 
assessed by a battery of psychometric tests. 
Equal numbers of subjects were selected on 
the basis of their performance on these tests 
for the following four ability groupings: 
high VS - high VF, high VS — low VF, low 
VS -high VF, and low VS- low VF. 
Subjects were then randomly assigned to one 
of three instructional conditions, that is, a 
control group, a visual elaboration group, or 
a verbal elaboration group, with each group 
learning word lists for which half the re- 
Sponse terms were high in rated imagery and 
the other half low in rated imagery. Thus, 
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this study attempted to demonstrate thy 
knowledge of a subject’s ability profile wou 
allow a differential prediction to be m, 
about which of the two elaboration co; 
tions would be more appropriate for his o; 
her abilities. 


Method 


Subjects 


Forty-eight undergraduates at the Universit] 
North Carolina at Chapel Hill served as subjects in 
experiment. These subjects were selected from al 
original group of 123 who were given a batte 
paper-and-pencil tests in one of three group t 
sessions. Each subject was given six tests consistin 
two measures of visualization/spatial ability, 
(Thurstone & Jeffrey, 1956) and Form Board (Freni l 
Ekstrom, & Price, 1963); three measures of verbal fly 
ency, Spelling Clues (Carroll & Sapon, 1958), Controller 
Associations (French et al., 1963), and Word Beginni: 
(French et al., 1963); and a pretest of paired-ass: 
learning ability, Part V of the Modern Language Apti- 
tude Test (Carroll & Sapon, 1958). 

The two visualization/spatial ability tests used here 
had been investigated in several previous studies of 
imagery. The Fl. 


dicate which of several 

be combined to make 

figure (cf. Bower, 
Three tests see 


production of 
the Word Begi 
words beginnii 


h paired-associate learning ability © 
provided a one-trial, study-test method of assessing the 
learning of word pairs consisting of a “Kurdish” stim- 
ulus word and an English translation. This variable ^ 
was used as a covariate in analyses of the between- 
subjects effects, 

Composite VS and VF scores were obtained for each 
Subject by summing the subject’s scores on the visual- 
fluency tests, respec- 


ssentially equivalent to. 
using sums of z scores, with the correlations between the 
composites and the corresponding sum of z scores being 
greaterthan.99, By excusing subjects who fell near the 

à ither composite score, 12. 
subjects were selected from each of the following groups 
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for participation in the remainder of the experiment: 
high VS - high VF, high VS - low VF, low VS — high VF, 
and low VS - low VF. Specifically, subjects were ex- 
cused who either had a composite VS score between 153 
and 162 inclusive or had a composite VF score between 
61 and 66 inclusive. The summary statistics for the four 
groups of subjects are shown in Table 1. 

The selection process was complicated somewhat by 
the existence of a moderate correlation (.39) between 
VS and VF. This positive correlation meant that there 
were more subjects classified as being high on both 
abilities or low on both abilities than were classified as 
high on only one ability. The general procedure fol- 
lowed was to select first the 12 most extreme subjects 
from both the high VS — low VF and the low VS - high 
VF categories, where extremity was assessed by the 
difference between the z scores associated with an in- 
dividual's composite VS and VF scores. Then, the re- 
maining 24 subjects were chosen so that the means and 
standard deviations of the selection variables for the last 
two groups would approximate as nearly as possible the 
corresponding values for the first two groups (see Table 
1). 


Materials 


Two lists of 12 word pairs each were presented to the 
subject. Each pair was made up of a foreign language 
word as the stimulus and an English word as the re- 
sponse. The 24 English words were selected from the 
table presented by Paivio, Yuille, and Madigan (1968), 
which gives concreteness, imagery, and meaningfulness 
ratings for each of 925 words. ‘Twelve words were se- 
lected that were high on both rated concreteness and 
rated imagery, and 12 words were selected that were low 
on both types of ratings. The two groups of words were 
also chosen so that their mean meaningfulness ratings 
were almost identical. The means for the high-rated 
and the low-rated imagery words, respectively, were 6.49 
and 3.00 for the imagery ratings, 6.88 and 2.01 for the 
concreteness ratings, and 5.71 and 5.72 for the mean- 
ingfulness measures. 

"The 24 stimulus words were selected from a Malay 
dictionary (Daud & Omar, 1973), subject to the con- 
straint that the words be four to six letters in length. 
Malay was selected as the foreign language for use inthe 
current study because (a) it is phonetically similar to 
English, thus facilitating the application of the elabo- 
ration instructions that some of the subjects were asked 
to use; (b) the language contains very few words that are 
cognates of English; and (c) the current subject popu- 
lation was thought unlikely to have had opportunity for 
previous exposure to thelanguage. These Malay words 
were then randomly paired with the English responses, 
and the 24 pairs were divided into two lists, each of 
which consisted of six pairs having a low-rated imagery 


1 DiVesta and Sunshine's (1974) list of 40 words taken 
from the Paivio, Yuille, and Madigan (1968) tables 
proved helpful in constructing the current lists, al- 
though additional words were required to arrive at 
equivalent meaningfulness ratings for the two groups. 
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Table 1 " 
Summary Statistics on Composite 
Visualization/Spatial Ability (VS) and Verbal 


Fluency (VF) Scores by VF and VS Groupings 


Verbal fluency ability 
Visualization High Low 
spatial Composite score 
ability VS VF VS VF 
High 
M 176.42 75.92 175.67 51.92 
SD 6.97 4.32 9.82 6.59 
Low 
M 131.92 72.67 133.25 53.67 
SD 16.17 5.35 14.47 5.05 


response and six pairs having a high-rated imagery re- 
sponse. The effects of the two lists did not require 
qualification of the basic results of the study, and thus, 
the results of separate analyses of these lists will not be 
considered. 


Procedure 


After the initial group sessions in which subjects 
completed the paper-and-pencil tests including the 
paired-associate premeasure, the experimental tasks 
were administered individually to the 48 selected 
subjects. Each subject was randomly assigned to one 
of the three instructional conditions. 

In all three instructional conditions, subjects began 
by performing a practice task in which they attempted 
to learn seven Malay-English word pairs. The practice 
task was subject paced to relax the subjects and was 
similar to the anticipation method of paired-associate 
learning, in that the experimenter first presented the 
stimulus alone and then the stimulus-response pair on 
white cards. Subjects were asked to read the stimulus 
word aloud when it was first presented to them, 
Throughout the practice and experimental tasks, 
subjects were permitted to refer to a card listing the 
possible English responses for the current list. This was 
done in order to conform to the assumptions regarding 
guessing in a mathematical model of learning that was 
used in an analysis not discussed in the current article 
(Delaney, 1975). 

The instructional conditions differed as follows, In 
the control condition, subjects were simply explained 
the procedures used in a standard anticipation paired- 
associate experiment. In the verbal elaboration con- 
dition, subjects were asked to search for lexical or verbal 
relationships between the stimulus and response terms 
of each word pair when the pair was initially presented. 
Asan example of a lexical relationship, it was suggested 
that subjects look for patterns of letters in the two 
words, such as the repeating letter sequence an in the 
Spanish-English word pair abanico-fan. To arrive at 
a verbal relationship between the words, it was sug- 
gested that the subject construct a phrase or sentence 
using both an English word suggested by the foreign 
language word and the given English translation. For 
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example, to remember abanico-fan, it was suggested 
that one might think of the word banned and construct 
a sentence like The dictator banned fans forever! In 
the practice task, the subject was required to state the 
letter pattern he or she was using as a cue or to state the 
phrase or sentence he or she constructed to connect the 
two words. 

In the visual elaboration condition, subjects were 
asked to construct a mental image of two things inter- 
acting to help them remember the association between 
the two words. The subject was told to think first of a 
concrete English word that sounded like or was spelled 
similarly to the foreign language word and then to 
imagine the object described by the English word in- 
teracting with that described by the translation word. 
For example, it was suggested that in learning aban- 
ico-fan, the foreign language word might bring to mind 
the English words a banjo, and one could envision 
someone playing a banjo in front of a fan. Subjects in 
this condition were given the option of either drawing 
a crude representation of their image or briefly de- 
scribing the image. 

In all conditions, subjects were asked to make a re- 
sponse whenever a stimulus word was presented, 
guessing (if necessary) one of the English responses 
shown on the card given them. The practice list of 
words was presented in a different random order for 
four trials, stopping earlier if the subject achieved a 
criterion of making a correct response to every item on 
the list on a given trial. 

Following the practice task, subjects were told that 
thels task 3 the yemaindes of the experiment was to 

earn a small pseudo-foreign language vocabulary. A 

Lafayette Model 303A memory drum was ed to 
present the word pairs in the experimental task, which 
followed a standard anticipation paired-associate par- 
adigm. Although paired-associate studies frequently 
use a faster rate of presentation, such as 1 or 2 sec for 
each frame (cf., e.g., Stewart, 1965), a slower rate was 
chosen for the current experiment because of the 
amount of processing subjects were asked to perform. 
Thus, stimuli and Stimulus-response pairs were each 
presented for 4 sec. An 8-sec interval separated suc- 
cessive trials on a list of words. Approximately 1 min- 
ute intervened between completion of the study of one 
list and the beginning of study on the other. On each 
of the two lists, subjects were run for five trials, or until 
they achieved a criterion of two Successive anticipations 
of all correct responses, whichever came first. The 
three subjects who achieved the criterion prior to Trial 
5 did so on Trial 4 and were credited with making all 
correct responses on the final trial, 

In the experimental task as in the practice task, 
subjects were asked to make a response to each stimulus 
word. However, in the elaboration conditions, subjects 
were no longer required to explicitly state their associ- 
ation between the stimulus and response terms. It was 
stressed, however, that the subjects were to use the same 
strategies that they overtly employed in the practice 
situation. 

Word pairs were randomly ordered on each trial, with 
the constraint that at least four items intervene between 
successive presentations of an item (Izawa, 1972; 
Tulving & Arbuckle, 1963). The same predetermined 
random orders were presented to each subject. 
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Following the completion of the final trial for th 
second paired-associate list, subjects in all conditions © 
answered a questionnaire concerning the strategies hey 
actually employed in the experimental task. Before 
beginning the questionnaire, subjects read a brief de. 
scription of the three strategies of visual imagery, verbal 
association, and repetition. They were told that th 
were to indicate for each item which of these strateg 
they used. Subjects were also informed that they coul 
indicate if they employed a strategy for an item that di 
not fit in one of the described strategies, if they did 
recall how they attempted to learn a word pair, or if the 
remembered that they were unable to make any co 
nection between or even rehearse the two words. 
subjects indicated which one of the six possible ci 
gories best described their behavior on each item. 
nally, in an interview with the experimenter, subj 
were asked to illustrate the type of mediation used fi 
certain items. One of the purposes of the questionnail 
was to help determine the effectiveness of the elabot 
tion instruction conditions by shedding light on. 
question, Did subjects in fact use the strategy they 
instructed to use? 


Results 
Recall Performance 


The principal hypotheses concerned the 
effects of the between-subjects factors; 
These were tested using as the basic depen- 
dent variable the number of correct re: 
sponses by a subject on Trials 2 through 5; 
summed over the two lists, and using as 4 
covariate in certain analyses the premeasure 
of paired-associate learning ability. Thi 


premeasure of 19.81, 
18.00, and 18.38 correct, the unadjusted 
treatment means on the dependent variable 
were 50.19, 58.75, and 64.69. 

Level of Performance within treatment 
and ability groups was predicted quite ac- 
curately by the premeasure. This predict- 
ability was indicated by the highly signifi- 
cant test of within-cell regression of the de- 
pendent variable on the covariate, F(1, 35). 
= 12.29, p <.001. However, since the means 
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Figure 1. Mean number of correct responses as a 


function of verbal fluency and instructional condi- 
tion. 


on the premeasure were so similar, the mean 
numbers of correct responses were only 
slightly changed when adjusted for this 
premeasure, becoming 48.27, 60.04, and 
65.31 for the control, verbal elaboration, and 
visual elaboration groups, respectively. This 
improvement of subjects in the elaboration 
conditions over that of subjects in the control 
condition was highly significant, P^ (1, 35) = 
10.54, p = .003. No significant difference 
between the two elaboration conditions was 
predicted, and none was observed, F(1, 35) 
= 1.10, p = .302. 

The data concerning the hypothesis of 
primary interest are presented in Figure 1. 
As predicted, the verbal elaboration in- 
Structions resulted in more correct responses 
than the visual elaboration instructions for 
high-VF subjects, whereas the reverse was 
true for low-VF subjects. The a priori con- 
trast assessing this directional hypothesis 
regarding the interaction of VF level with the 
two elaboration instruction conditions was 
significant, both with the covariate, t (35) = 
2.35, p = .013, and without, ¢(36) = 1.94, p 
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= .031. As shown in Figure 1, this interac- 
tion is disordinal, as defined by Glaser and 
Resnick (1972), in that the lines for the 


elaboration treatments intersect when the | 


mean numbers of correct responses are 
plotted as a function of verbal ability. 

It will be recalled that an exactly analo- 
gous hypothesis to that just discussed was 
made concerning the interaction of VS 
ability with instructions. However, the test 
of this interaction did not approach signifi- 
cance (£ < 1.00). Although as shown in 
Figure 2, the slope of the line connecting the 
means of the VS ability groups for the visual 
elaboration condition was positive and that 
for the verbal elaboration condition was 
negative, the lines did not intersect for the 
range of ability levels included in the current 
study. 

Regarding the within-subjects factor, the 
mean numbers of correct responses were 
31.56 and 26.31 for items having concrete 
and abstract responses, respectively, as 
shown at the bottom right of Table 2. Thus, 
the hypothesis concerning the effects of 
variation in rated response imagery was 
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Figure e 2. Mean number Of correct responses as à 
function of visualization/spatial ability and instruc- 
tional condition. 
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confirmed, F(1, 36) = 50.78, p < .001. None 

-of the interactions of response imagery with 
the between-subjects factors approached 
significance, thus replicating Paivio and 
Yuille (1967). 

Finally, though the principal hypotheses 
concerned the interactions of ability levels 
with treatment conditions, overall main ef- 
fects were also expected to be significant for 
the ability factors. However, neither of 
these tests approached significance, with or 

; without the covariate (p > .3). 


Questionnaire Results 


Only descriptive statistics were computed 
for the responses made by subjects to the 
questionnaire that was administered fol- 
lowing the experimental task. Subjects in 
all conditions reported using a variety of 
strategies, with a verbal strategy being re- 

„ported more frequently than any other. 
Recall that each of 48 subjects indicated 
which of six categories best described his or 
her behavior with respect to each of 24 items. 
Table 3 presents the mean percentages of 
items for which subjects reported using a 
visual elaboration strategy, a verbal elabo- 
ration strategy, or simply repetition as a 
function of the factors of the design. As may 
be seen in the rightmost column at the bot- 
tom of the table, a verbal strategy was re- 

“ported as having been used for 36.6% of the 
total of 1,152 responses, whereas repetition 
was reported for 23.096 and a visual strategy 
for 21.8%. One of the other three categories 
was designated for the remaining 18.6%. 


These final three categories are not listed , 


separately in Table 3 because they were in- 
dicated infrequently regardless of the ex- 
perimental condition, with 6.9% of the total 
responses being “don’t recall,” 6.9% being 
“no strategy used,” and 4.8% being “other 

" strategy." j 

_ As would be expected, the greatest dif- 
ferences in distribution of reported strategies 
resulted from the instructional condition and 
response imagery factors. The assertion 
that subjects followed the instructions given 
them was lent support by the fact that the 
largest number of reported uses of a visual 
strategy by a treatment group was in the vi- 
sual elaboration condition (34.9%), the 

" largest number of reported uses of a verbal 
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strategy was in the verbal elaboration con- 
dition (45.396), and the largest number of 
reported uses of repetition was in the control 
group (47.196). 

Despite the lack of an interaction of re- 
sponse imagery and elaboration condition in 
determining recall performance, the ques- 
tionnaire data indicated that subjects did 
find it difficult to construct mental images, 
but not verbal associations, connecting for- 
eign-language-English word pairs in the case 
where the English response was an abstract 
noun. This is illustrated by the fact that in 
most of the ability-treatment combinations, 
subjects reported using a visual strategy 
more frequently but a verbal strategy less 
frequently for items with high-imagery re- 
sponses than for items with low-imagery 
responses. In fact, overall subjects reported 
using a visual strategy three times as fre- 
quently for pairs having concrete (32.896) 
rather than abstract (10.896) responses, while 
a verbal strategy was reported somewhat 
more frequently for abstract items (38.496) 
than for concrete items (34.996). It is also 
interesting to note that subjects made nearly 
twice as many correct responses to items for 
which they indicated they were able to form 
an explicit visual, verbal, or other kind of 
association than to other items. 

As on the primary dependent variable of 
number correct, high- and low-VS ability 
groups differed very little in the pattern of 
strategies they reported using. However, 
the breakdown by levels of VF ability, sum- 
marized in Table 4, revealed interesting 
differences. In general, low-VF subjects 
reported using visual images and repetition 
more frequently but verbal strategies less 
frequently than high-VF subjects, particu- 
larly in the control condition where subjects 
were not given any assistance in arriving at 
mnemonic strategies. However, of the most 
interest and consistent with the interaction 
in the recall data portrayed in Figure 1, were 
the reports of the low-VF subjects receiving 
elaboration instructions. When low-VF 
subjects were given visual elaboration in- 
structions, they apparently were able to 
generate either a visual or verbal association 
for a larger proportion of the items (77.1%) 
than did subjects in any other ability- 
treatment combination. 
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Table 4 

Distributions of Reported Strategies as a 
Function of Verbal Fluency and Instructional 
Condition 


Reported Instructional condition Row 
strategy Control Verbal Visual M 

| High verbal fluency 

! Visual 8.9 19.8 30.7 19.8 
Verbal 35.9 47.4 39.1 40.8 
Repetition 38.5 14.6 12.0 21.7 

J Low verbal fluency 
Visual 12.0 20.3 39.1 23.8 
Verbal 16.1 43.2 38.0 32.5 
Repetition 55.7 9.4 7.8 24.3 


Note. Alldata are percentages. 
Discussion 


The current study contains a number of 
important results. A unique and perhaps 
the most interesting finding of the present 
study was the interaction of VF ability with 
the two elaboration conditions. It was rea- 
soned in advance that for high-VF subjects, 
the verbal elaboration strategy would pro- 
duce a greater amount of facilitation than 
would the visual elaboration strategy be- 
cause the former would be particularly 
suited to the abilities possessed by these 
subjects. For low-VF subjects, the reverse 
was expected, that is, the visual elaboration 
instructions were expected to produce the 
greater effect. Precisely this result was ob- 
tained in the ordering of mean performances 
of the high-VF subjects and in the ordering 
of mean performances of the low-VF 

‘subjects. The possibility of such an Apti- 
tude X Treatment interaction had not been 
adequately investigated previously, as the 
only previous test made use of a very general 

= of verbal ability, namely, the Verbal 


€— l 


^ 


subscale of the Scholastic Aptitude Test. 
Thus, it appears that the ability required by 
i verbal elaboration strategy is more spe- 


cific in nature. 

The difference between high-VF and 
low-VF subjects did seem to be in the strat- 
egies or processes they were able to use. As 
may be seen from Table 4, the percentage of 
verbal strategies (40.8%) reported by 
high-VF subjects in all conditions over all 
items was 25% larger than the comparable 
figure for low-VF subjects (32.5%). (Note 
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that it is possible the questionnaire revealed 
effects on verbal reports of strategies rather 
than actual use of strategies.) This sharply 
contrasts with the breakdown by VS ability, 
where high-VS subjects actually reported 
using slightly fewer imagery strategies than 
low-VS subjects. 

Although the interaction that obtained 
had been predicted, the fact that a low-VF 
group (namely, those receiving visual elab- 
oration instructions) performed better than 
any of the other three subgroups involved in 
the interaction (cf. Figure 1) was unantici- 
pated. Perhaps initial successful perfor- 
mance made possible by an effective, ex- 
perimenter-supplied strategy was particu- 
larly motivating to these low-ability subjects. 
However, greater insight into how this 
anomalous result occurred may be gained 
from a careful analysis of the questionnaire 
data (cf. Table 4). One discovers from these 
subjects’ comments that the tendency, re- 
vealed in the control condition, of the 
low-VF subjects to use a visual imagery 
strategy somewhat more frequently than 
high-VF subjects was greatly magnified in 
the visual elaboration instruction condition. 
An additional effect of the visual elaboration 
instructions was the amelioration of the 
deficit, also seen in the control condition, of 
low-VF subjects in the use of verbal strate- 
gies. In sum, although the performance of 
the low-VF visual elaboration group was 
even better than expected, the disordinal 
interaction that obtained is strong evidence 
in favor of appropriate assignment of indi- 
viduals to instructional treatments on the 
basis of their ability profiles in order to more 
closely approximate maximally efficient 
learning. 

Turning now to a discussion of findings of 
the current study that are consistent with 
previous research, it may be noted that the 
significant main effect for rated imagery of 
the response words replicated in certain re- 
spects work demonstrating the well-known 
finding that more concrete material is easier 
tolearn. The fact that the current effect was 
highly significant and consistent across all 
treatment groups is perhaps noteworthy in 
that imagery was manipulated on the re- 
sponse rather than on the stimulus side. 
Generally, stimulus imagery manipulations 
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have been shown to have a greater effect on 
learning (Paivio, 1965), and few, if any, 
studies have manipulated imagery charac- 
teristics of items in conjunction with foreign 
language words. Further, both high- and 
low-imagery-ability subjects found verbal 
material that was easily imaged to be easier 
to retain. 

The finding that asking subjects to men- 

tally elaborate on the presented material 
facilitated learning also replicated previous 
work in certain respects (cf., e.g., Paivio & 
Yuille, 1967). “As has been pointed out by 
other investigators, this result has implica- 
tions for educational procedures and sup- 
ports the hypothesis that the more actively 
involved the learner, the greater the amount 
of material learned. The hypothesis that 
the subjects in the elaboration conditions 
made more correct responses because they 
were using higher-order strategies was lent 
correlational support by the increased re- 
ported use of such strategies on the ques- 
tionnaire as compared with responses of 
subjects in the control group. 

Finally, it should be noted that the failure 
to obtain some of the predicted results in the 
current experiment apparently contradicts 
the findings of other studies in the literature. 
For example, the lack of a significant main 
effect of imagery (VS) ability and the lack of 
a significant interaction of imagery ability 
with the kind of elaboration the subject was 
asked to employ are at least superficially in 
conflict with the results of the studies by 
Stewart (1965) and by DiVesta and Sunshine 
(1974) discussed in the introduction. Both 
Stewart and DiVesta and Sunshine obtained 
a significant interaction of VS ability and 
treatment conditions, and the latter authors 
obtained a significant main effect for VS as 
well. The current study indicates certain 
limits to the generality of these effects. 

One may speculate about these findings 
regarding VS ability either by analyzing the 
tests employed or by analyzing the demands 
of the task facing subjects in the experi- 
mental situation. Considering first the 
psychometric tests, the fact that the pre- 
dicted results were obtained for verbal flu- 
ency but not for visualization/spatial ability 
may have hinged upon the particular mea- 
sures of these abilities employed in the cur- 
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rent study. The tests assessing verbal flu 
ency required use of specific skills, such 
the production of synonyms for a given wor 
that would appear to be useful in the currei 
task. In particular, the verbal elaboratioj 
condition required searching for lexical | 
semantic relations between (a) the Englis 
translation and (b) the foreign languag 
word or the key word (Raugh & Atkinsoi 
1975); knowing a large number of synonyn 
for the English translation should facilita 
thissearch. In contrast, the tests of visua 
ization/spatial ability were more abstract) 
related to the skills required in the vis 
elaboration condition. While both the tas 
and the VS tests allowed use of an ima 
ability, the Flags and Form Board 
which measured this ability, required 
transformation or mental rotation of 
sented geometrical figure. The vis 
elaboration instructions, however, calli 
the production of an appropriate ima 
connecting the referents of the key word 
the English translation. Perhaps becaut 
of this difference, there was almost no ind 
cation in the current study that the imager! 
transformation tests were related to perfor 
mance in the experimental task. In fact, th 
overall mean number correct for high-Vi 
subjects was slightly less than for low-V 
subjects (57.83 vs. 57.92). Even in the visué 
elaboration condition, while an interestin 
negative relationship between VF ability an 
reported willingness to use a visual strateg 
was suggested (cf. Table 4), there was littl 
evidence of a relationship between VS abilit 
and performance (cf. Figure 2). + 
Nonetheless, use of these tests to asses 
imaging ability has allowed prediction 
performance in other situations, as note 
above. Thus, likely reasons for these dif 
ferences will be considered briefly. Perh ap 
the most probable reason that these di 
crepancies were observed is that there 
differences in the nature of the task used 
here as opposed to that employed by Steywar 
or by DiVesta and Sunshine. Stewart al 
tempted to control the processing used PY 
subjects by manipulating presentation mode 
that is, pictures or words. The results 
such a manipulation appear not to be comi 
parable to the results of attempting to com 
trol visual versus verbal processing by tht 
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instructions given subjects: The principal 
differences in the task requirements of the 
current study and that by DiVesta and 
Sunshine are that in the latter (a) a 
subject-recalled English word (i.e., the word 
from the mnemonic jingle) was to serve as 
the stimulus for the subjects response rather 
than a foreign language word presented by 
the experimenter; and more importantly, (b) 
the subject was required to overtly produce 
(in written form) the mediating link being 
used to associate the stimulus and response 
terms. Thus, it appears that at least at the 
level of sensitivity of the current study, the 
interaction of visualization ability with 
elaboration instructions is not sufficiently 
robust to manifest itself when these two as- 
pects of the task are altered. 

Because of the lack of a main effect for VS 
ability or of an interaction involving VS 
ability, the only requirement for making 
optimal assignment of subjects to instruc- 
tional conditions for the type of task in- 
cluded in this study would appear to be 
knowledge of a subject's VF ability. To 
maximize mean performance overall, the 
current data suggest that subjects who are 
high in VF ability should be given verbal 
elaboration instructions and subjects who 
„are low should be given visual elaboration 
instructions. In the present study, the un- 
adjusted mean over these two ability- 
group-treatment pairings was 67.31 com- 
pared with 53.16 for the other four possible 
combinations of VF level and instructional 
treatment or compared with 56.13 for just 
ithe other groups receiving elaboration in- 
structions. These differences represent 
gains of 27% and 20%, respectively, which are 
substantial enough to be of importance in 
many applied settings. 

The final picture presented by the current 
investigation is a reasonably clear one. This 
study has demonstrated that high-imagery 
| responses and elaboration instructions result 


ün a significant improvement in rate of 
learning foreign language vocabulary words. 
However, a more important additional 
finding is that the effects of elaboration in- 
structions must be qualified by their inter- 
action with individual-difference variables. 
It remains to be seen to what extent these 
findings may be generalized to other tasks or 
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may be exploited in practical educational 
situations. 


Reference Note 
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abstract and concrete materials as functions of 
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and Instruction in Sixth-Grade Mathematics 
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Three groups (n = 60 each) of sixth-grade students were measured on fluid 
and crystallized ability, given instructions capitalizing on one of the two abili- 
ties or no instruction (control), and pre- and posttested for achievement of the 
2^ subset mathematical rule. In a multiple regression analysis, using coded 
vectors and an a priori ordering of independent variables, pretest scores, gen- 
eral ability (sum of fluid and crystallized abilities), crystallized instruction, 
and the interaction of the crystallized treatment with general ability made sig- 
nificant contributions to the variance of the dependent variable. The distinc- 
tion between the two types of ability as conceptualized in this study did not 
add significantly to the multiple correlation. 


A general intellectual ability factor often 
relates to achievement no matter what the 
instructional treatment (Cronbach & Snow, 
1977). However, there is the possibility that 
instruction developed to capitalize on dif- 
ferentiated measures of ability might show 
differential effect upon performance. 

The studies of Cattell and others (Cattell, 
1963, 1971; Cattell & Butcher, 1968; Horn, 
1965; Horn & Cattell, 1966) have given evi- 
dence supporting the ability distinction la- 
beled by Cattell as fluid and crystallized 
intelligence. Fluid intelligence appears to 
be a factor of general brightness or ability, 
rather independent of schooling or experi- 
ence. Crystallized intelligence can be de- 
scribed as acquired knowledge and skills. 
Cattell’s (1971) investment theory described 
the development of fluid ability and its re- 
lationship to crystallized ability, proposing 
a multilevel model of ability organization 
with these abilities at the top. 

Some such hierarchical view of ability 


organization is now commonly accepted. 


Cronbach and Snow (1977) noted that within 


_ the structure of abilities, some form of hier- 


archical arrangement appeared to exist, with 
fluid ability at the peak of the system. They 
also suggested that to understand these 
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higher-order factors, studies were needed in 
which both types of ability were measured. 

This investigation studied both abili- 
ties—in interaction with instructional 
treatments designed to capitalize on each 
ability—in promoting acquisition of a 
mathematical concept. 


Method 


Subjects 


The sample was obtained from sixth grades in schools 
highly similar in size of sixth-grade enrollment and so- 
cioeconomic factors. Subjects had been assigned to 
classes on a random basis when enrolled, and treatments 
were randomly assigned to classes. Subjects were 
randomly discarded to form two experimental groups 
and one control group of 60 subjects each. 


Treatments 


Two treatments were developed. One was designed 
to capitalize on the pattern of abilities incorporated in 
fluid intelligence (Tf); the second was designed to cap- 
italize on those abilities incorporated in crystallized 
intelligence (Tc). From the different abilities com- 
posing fluid and crystallized intelligence as specified by 
Horn (1967, 1968), two sets of specifications were de- 
veloped. The Tf treatment was constructed to require 
the following pattern of abilities: (a) induction, or the 
ability to discover a general rule from several particular 
incidents; (b) span of apprehension, or the ability to 
recognize and retain awareness of the immediate sur- 
roundings, that is, memory span; (c) general reasoning 
or estimating; (d) associative memory, or the ability to 
aid memory by observing the relationship between 
separate items; and to a lesser degree (e) deductive 
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reasoning, or the ability to reason from the general to 
the particular. The pattern of abilities incorporated 
into the Tc treatment consisted of (a) verbal compre- 
hension of general information; (b) experiential evalu- 
ation, or commonsense; (c) formal reasoning, or cultur- 
ally logical reasoning; (d) number facility, or the ability 
to do numerical calculations; and to a lesser degree (e) 
general reasoning. 

The five abilities listed for each pattern were those 

that seemed most frequently and adequately measured 
on the basis of careful examination of the items on all 
subtests of the Institute for Ability and Personality 
Testing Culture Fair Intelligence Test (IPAT; Cattell 
& Cattell, 1963) and the Primary Mental Abilities Test, 
Revised (PMA; Thurstone, 1963). Although there 
appears to be some overlap in the two patterns, they are 
indeed consistent with Horn's writings, especially his 
explanatory article on intelligence (Horn, 1967, Note 
1). General reasoning was included in both lists be- 
cause, as Horn has stated, fluid and crystallized intel- 
ligence represent both distinct and overlapped patterns 
of abilities. This overlap represents alternative 
mechanisms in intellectual performance. In other 
words, a given kind of problem sometimes can be solved 
by the exercise of different abilities. In relation to the 
similarity between deductive reasoning in one list and 
formal reasoning in the other, reference is again made 
to Horn's descriptions. Formal reasoning, according 
to him, depends upon dealing with abstractions and 
symbols in highly structured ways, thus representing 
a crystallized ability. Deductive reasoning does not 
require any learned sequence of problem-solving steps 
and produces a more creative response, thereby re- 
flecting fluid abilities. 

‘The instructions were designed to teach compre- 
hension of the fact that for any given set of elements, the 
number of subsets of that set was equal to 2^, where n 
equaled the number of elements in the original set. 
Each experimental treatment included visuals and a 
script (containing instructions to the learners), which 
were coordinated mechanically into a slide-tape pre- 
sentation. The content of each treatment included set 
theory, exponential notation, and probability. Tode- 
termine a logical format and examples for the content. 
of the instruction, Heddens' (1971) guide on concepts 
ps methods in elementary school mathematics was 
Four judges, two in mathematics education and two 
educational psychologists, met four times to assess the 
degree to which the criteria for treatment development 
were met by the instructions. On a checklist used by 
these judges, final ranks (1 = high and 5 = low) on all 
criteria showed strong agreement (median values = 1,0 
D 2.5), providing satisfactory validation of the materi- 
a 


A test of objectives for each subsection of instruction 
was constructed in two forms, the differences between 
forms being substitutions of symbols and numerals. 
One form was used as a Pretest, the other as a posttest. 
A pilot study was conducted to determine reliability of 
the pre- and posttests and the appropriateness of 
treatment presentations The split-half reliabilities, 
stepped up by Spearman-Brown, were low and unsat- 
isfactory, so considerable revision of both tests was 
made before the main study began. On a randomly 
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selected sample of 30 within each instructional treat. 
ment group, a Kuder-Richardson Formula 20 coeffi 
cient was calculated for each test. These coefficients 
ranged from .74 to .91, so reliabilities were judged sat- 
isfactory in the main study. Based on the pilot study, 
minor changes were also made to improve the instruc- 
tional administration procedures. 


Procedure 


During the fall of the school year, the PMA and IPAT 
tests were administered to all subjects. The PMA 
provided a measure of crystallized ability, while the 
IPAT represented a measure of fluid ability. The de- 
viation IQ scores from these tests were used in the sta- 
tistical analysis. In January, the pre- and posttests and 
instructional treatments were administered to all' 
subjects under standard conditions. The control group 
was pretested 1 day and posttested 2 days later. In the 
same week and in the same 3-day sequence, the two 
experimental groups were pretested on the first day, 
received the 30-minute slide-tape instructions on the 
second day, and were posttested on the third day. 
While the two experimental groups received the in- 
structions, control subjects engaged in unrelated 
classroom activities. 

Neither the pretest nor the posttest was timed. To 
avoid disturbance, those who finished early were given 
puzzles to work, which had no relation to the content of 
instruction, 


Design and Analysis 


"The basic design contrasted three treatments: To 
Tf, and control. Multiple regression analysis with 
coded vectors for treatments was used to analyze the 
data. To represent aptitude, a sum and difference 
method of handling the Cattell conception of ability 
organization was used because this approach seemed the 
most preferable and parsimonious way of testing 
whether the special distinction between fluid and 
crystallized ability was important beyond the contri- 
bution of the general factor common to each. The 
analysis determined the contributions of pretest, sum 
(fluid plus crystallized ability), difference (fluid minus 
crystallized ability), treatment (instruction), Treatment 
X Sum, and Treatment X Difference, in that order, to 
posttest performance. Sum represented a composite 
general ability. Difference represented special ability 
or the difference or distinction between the two abilities 
Treatment X Sum and Treatment X Difference repre- 
sented the interactions of treatment with general and 
Special ability, respectively. (For discussion of this 
approach, see Cronbach & Snow, 1977, and Kerlinget 
& Pedhazur, 1973.) 


Results 


_ Table 1 shows means and standard de- 
viations for the three groups on the four tests 
administered. Analyses of variance revealed 
that differences among means at the .01 level 
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Table 1 
Means and Standard Deviations for the Three Treatment Groups 


‘Treatment group 


anal 


Fluid 
Test M SD 
Pretest 10.00 4.76 
IPAT (fluid) 103.35 1743 
PMA (crystallized) 105.43 13.73 
Posttest 13.52 548 
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Crystallized Control 
M SD M SD 
703 421 10.72 4.93 
93.52 20.93 106.50 17.19 
102.18 14.34 109.28 11.29 
1173 8.07 11.30 4.66 


Note. n= 60 per group. IPAT = Institute for Ability and Personality Testing Culture Fair Intelligence Test; PMA = Primary 


Mental Abilities Test, Revised. 


existed for the pretest, IPAT, and PMA re- 
sults. Tc appeared lower on average than 
or control. Means did not differ, however, 
on the posttest. 

Correlations among these same four tests 
are shown in Table 2. As is evident, all 
variables were significantly interrelated. 
Particularly high correlation existed between 
the measures of fluid and crystallized ability 
and also between the pre- and posttests. 

Because of the complexity of the problem 
under investigation associated with the types 
of variables and their relationships, a mul- 
tiple regression analysis with effect coding 
then was calculated. When effect coding is 
used, the regression equation reflects the 
linear model. 

Table 3 presents the results of the multi- 
ple regression analysis. ‘The obtained R was 
.73, with R? = .53. The equation for the 
overall regression was 


Y’ = .27 + .75(pretest) + .01(sum) 
— .06(difference) + 4.43(Tf) — 10.73(Te) 
— .04(Tf X Sum) + .08(Tc X Sum) 
+ .01(Tf X Difference) + .10(Tc 
X Difference). 


Table 2 
Intercorrelations for All Groups 


Test Fluid Ci lized Control 


IPAT (fluid) 
PMA (crystallized) .76* 
Pretest 49° 55° 
Posttest 46* 50” 62° 
Note. N = 180 (60 p). IPAT = Institute for Ability and 
Personality "Testing Culture Fair Intelligence Test; PMA = 
md Mental Abilities Test, Revised. 

p « 01. 


Pretest accounted for the major propor- 


Tf tionofthe variance, and it should be recalled 


that there were significant mean differences 
among groups prior to treatment on this test, 
General ability (sum) made a significant 
contribution but accounted for a small (496) 
proportion of variance, given that pretest 
scores had already been entered. Special 
ability (difference) made no significant 
contribution. Thus, the ion between 
the two abilities added nothing to prediction 
after removing the effects of pretest and 
general ability. 

Another significant contribution was 

ided by the dummy variable contrasting 

Tc and control groups, accounting for 6% of 
the variance of the posttest scores, Lastly, 
there was a significant contribution made by 
the Tc X Sum interaction, with the propor- 
tion of criterion variance accounted for 
amounting to 3%. No other terms added 
significantly. 

The regression equations for the three 
treatment groups were as follows: 


Y'(fluid) = —4.16 + .75(pretest) 
+ .05(sum) — .06(difference), 


Y'(crystallized) = —10.46 + .75(pretest) 
+ .09(sum) + .04(difference), 


and 


Y'(control) = .27 + .75(pretest) 
+ ,01(sum) — .06(difference). 


Examination of these coefficients and a 
drawing of the regression plots (not shown) 
suggest a disordinal interaction between the 
Tc treatment, on the one hand, and the fluid 
and control treatments, on the other hand, 
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Table 3 ; à 
Results of the Multiple Regression Analysis 
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96 
Source SS df MS variance F 
2,146.65 1 2,146.65 .39 144.11 
Sant) 265.50 1 265.50 04 13.93 
Difference (B) 5.17 1 5.17 .00 at 
Fluid (C) 58.71 1 58.71 01 4 
Crystallized (D) 448.94 1 448.94 06 23.55 
AXC 11.19 1 11.19 -00 59 
AXD 236.55 a 236.55 03 12.41 
BxC 22.54 1 22.54 .00 1,18 
BxD 29.99 1 29.99 .00 1.57 
Error 3,239.72 170 19.06 AT 
Total 7,064.96 179 1.00 


with general ability; the point of intersection 
was within the range of ability scores. In 
other words, the crystallized treatment had 
the most effect on the posttest scores 
through the operation of its effects with the 
general ability of the student. An analysis 
using a different approach (analysis of co- 
variance via SPSS; Nie et al., 1975) yielded 
somewhat different values but would sustain 
the same conclusions. 


Discussion 


The results suggest that if a student shows 
lower general ability, instruction focused on 
fluid ability would be most beneficial for 
learning the mathematical concept. On the 
other hand, if a student shows higher general 
ability, then instruction focused on crystal- 
lized ability would be the most helpful. The 
results provide no evidence that the dis- 
tinction between fluid and crystallized 
ability is important for instruction. But this 
latter conclusion depends on the adequacy 
of the Tc and Tf treatments constructed 
here. It is evidently more difficult to create 
instruction uniquely suited for students high 
on fluid ability relative to crystallized abili- 
ty. 
Fluid and crystallized ability are closely 
related theoretically and empirically. The 
difference score used here to reflect the 
special qualities of each may have been 
largely error variance, so perhaps this is not 
a definitive test of the distinction. Further 
attempts need to be made. But these data 


do suggest that we know how to capitalize 
crystallized ability; we do not know how 
capitalize on fluid ability. The results cer: 
tainly should not be regarded as a definitive 
test of the Cattell-Horn theory. Neither 
PMA or the IPAT are pure measures of th: 
two types of abilities, and thus the distin 
tion between the abilities needs to be mea 
sured much more precisely. With due con 
sideration of the constraints involved, th 
results nevertheless may provide a beginni 
understanding of aptitude-treatment i 
teractions in the case of fluid and crystalli: 
abilities, as well as a partial answer to thi 
question of whether methods of instructio! 
can be found that better serve one or thi 
other of the two types of ability. 


Reference Note 


1. Horn, J.L. Personal communication, April 1973. 
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Fluid and Crystallized Intelligence and Broad Perceptual | 
Factors Among 11 to 12 Year Olds 


Lazar Stankov | 
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A battery of 36 visual and auditory tests was given to a sample of 113 primary 
school children. Second-order analysis of the data yielded two well-defined 
factors representing Fluid (Gf) and Crystallized (Gc) Intelligence and two per- i 


ceptual factors corresponding to General Visualization (Gv) and General Au- 
ditory Function (Ga). Perceptual factors were not clearly separated from 
broad intellective factors at this age level. | 


Factor analyses of large batteries of tests 
consisting of markers for the well-replicated 
primary abilities usually lead to extraction 
of two broad factors at the second order. 
Both factors involve the processes of per- 
ceiving relationships, educing correlates, 
reasoning, abstracting, attaining concepts, 
and solving problems, that is, the processes 
usually claimed to be important for intelli- 
gent behavior. One of these two factors, 
Fluid Intelligence (Gf), is involved in tasks 
in which relatively little advantage accrues 
from intensive or extended education and 
acculturation; the other one, Crystallized 
Intelligence (Gc), represents tasks in which 
either the content or the operations involved 
depend on education and acculturation. 
This theory, now known as the theory of 
fluid and crystallized intelligence (Gf-Ge 
theory), was first proposed by Cattell and 
then supplemented by Horn (see Cattell, 
1963, 1971; Horn, 1968, 1970, 1975; Horn & 
Cattell, 1967). 

Two features of the Gf-Gc theory that are 
relevant for the discussion here are (a) that 
it is a developmental theory of the structure 
of human abilities and (b) that it takes into 


The work reported in this article was supported by 
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account the fact that, in addition to Gf ad 
Gc, there are several other broad factors 
operating within the putative measures of 
intelligence. With regard to the first feature, 
the references cited above provide the the- 
oretical account and review of the relevant 
literature. Although Cattell (1967) reported 
data on children to check (at the subadult. 
level) the course of development of Gf and: 
Gc alone, most of the studies on Gf and Gey 
have been based on adult subjects. 
Following Horn’s formulations of the 
Gf-Gc theory, Undheim (1976) postulated 
that at the age of 10 to 11 years, when school 
has exerted influence for some years but 
little specialization has yet occurred, (a) the 
aspects of intelligence most tied to verbal- 
education experience should be a tightly knit 
unit and (b) fluid and crystallized intelli- 
gence, if distinguishable at all, should be 
highly correlated, more so than has beent 
observed in studies with adults. Undheim 
found evidence for both predictions. He 
also postulated and found evidence that at 
this age level, Gf and Gc could be distin- 
guished from two other broad factors, that 
oues (S) and General Visualization 
v). 


Similar hypotheses can be advanced re- 
garding the age group on which the present 
study is based. The last-mentioned broad 
factor (Gv) represents visual perceptual 
processes (such as imagining the way objects 
may change as they move in space, keeping 
configurations in mind, etc.) and reflects the 
fact that most ability tests use only one (vi- 
sual) modality. Its repeated occurrence 
suggests that if the tests were devised to 
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measure Gf and Gc through a different 
modality, then another perceptual factor 
would be necessary to reflect this modality. 
Stankov (1971) confirmed this hypothesis 
with auditory tests by obtaining a General 
Auditory Function (Ga) factor as well as Gf, 
Gc, and Gv. That study again used adults. 
'The main concern here will be to discover 
whether the same four factors (i.e., Gf, Gc, 
Gv, and Ga) can be identified with younger 
subjects. Information on this question will 
contribute to knowledge of the develop- 
mental course of the perceptual factors and 
will also provide a check on what has been 
found regarding Gf and Gc. 

In addition to Horn's and Cattell's writ- 
ings, a recent article by Bock (1973) gave a 
comprehensive summary of what is known 
about Gv (though not stated in terms of 
Gf-Gc theory) Much less is known at 
present about the nature of Ga. For that 
reason, before turning attention to the 
present study, it is necessary to review 
briefly some outcomes of the study in which 
Ga was discovered. 


General Auditory Function 


Stankov (1971) gave a battery of 71 tests 
to a sample (N = 241) of adult subjects aged 
18 to 61 years. The battery consisted of two 
sets of tests. Those in the first set were vi- 
sual tests measuring well-replicated primary 
abilities drawn mainly from the French, 
Ekstrom, and Price (1963) list and chosen to 
measure Gf, Ge, and Gv at the higher order. 
The second set of tests (50 altogether) con- 
sisted of auditory tests only, classifiable into 
the following categories: First, some tests 
were the same as the visual tests but were 
given through auditory channels (e.g., Vo- 
cabulary Test). Second, some tests were 
created by the author in order to be as close 
as possible to the visual tests. For example, 
an analogue of the Letter Series or Number 
Series Test would be an auditory test in 
which frequencies or intensities of tones were 
varied in such a way as to form a sequence. 
Third, some tests were standardized “mu- 
sical abilities” tests, such as Seashore’s and 
Wing’s batteries, or were created on the basis 
of the descriptions of the tests used in speech 
perception studies (e.g., the White Noise 
Masking Test). 
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Factor analysis of the auditory tests alone 
produced seven factors (see those marked 
“auditory” in the modality column of Table 
1). These seven factors, at the second order, 
formed three factors that were interpreted 
as Gf, Gc, and Ga. Extension analysis and 
the second-order analysis based on all-71 
tests showed that Gf, Gc, Gv, and Ga could 
be clearly identified. Therefore, Gf and Gc 
can be tapped by both visual and auditory 
tests. Gv is defined by visual tests only, and 
it does overlap with Gf. Ga is defined by 
auditory tests, and it overlaps with Gc. Fi- 
nally, there is a relatively small overlap be- 
tween Gv and Ga. The General Audi- 
tory Function involved processes of recog- 
nizing words spoken with some kind of in- 
terference, maintaining steady tempo and 
recognizing changes in rhythmic patterns, 
keeping the order of presentation of tones in 
mind and recognizing changes in this order, 
and so on. 

General Auditory Function correlated 
with age and with auditory loss detected by 
an audiometer test. This can be interpreted 
to mean that the decline in Ga during adult 
years depends on the reduction of auditory 
acuity commonly observed in older people. 
All this information was taken to indicate 
that Ga indeed represents broad auditory 
perceptual processes akin to Gv in the visual 
domain. Horn (1974) provides a detailed 
discussion of auditory factors and of their 
relationship with visual primaries. 


The Problem 


The aim of the present research is to in- 
vestigate whether the results obtained by 
Stankov (1971) could be replicated with 
younger subjects and in a different culture. 
To this end, a battery of 36 tests was as- 
sembled to measure seven auditory factors, 
plus six visual primaries known to load Gf, 
Gc,and Gv. An attempt was made to ensure 
that at least three tests defined every pri- 
mary, but that was not possible due to the 
time limits imposed. Table 1 presents the 
list of primaries, variables used to assess 
these primaries, and hypothesized second- 
order structure based on the previous stud- 


ies. 
Most of the primaries used here are the 
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Pm ized Second-order Structure 
Primary Factors, Tests Used to Assess Them, and Hypothesized Secon 
Primary factor Symbol Modality Test* Gc Gf Gv Ga 
Verbal Comprehension M Autore TES x 
Experiential Evaluation EMS Visual 45, 2 x x 
Induction I Visual 5 S A 
Auditory Induction Ia Auditory aaa z 
Memory Span Ms Auditory 12, 13, 1 4 x 
"Temporal Reordering Tr Auditory m w x 3 * 
Sound Pattern Recognition Pr Auditory 17, 25:9 x z 
Relation Perception Rp Auditory PE E 
Spatial Scanning Ss Visual re 2 
Flexibility of Closure Cf Visual » 2 Hi x 1 
Perceptual Speed P Visual 28, 29, 3 
Masking Mg Auditory 31, 32, 33 
'Tempo T Auditory 34, 35, 36 x 
Note. Gc represents the factor of Crystallized Intelligence; Gf, the factor of Fluid Intelligence; Gv, the factor of General Visu- 


alization; and Ga, the General Auditory Function. 
^ The test numbers correspond to the tests listed in Table 2. 


same as those employed by Stankov (1971). 
There are two new primaries (Spatial Scan- 
ning and Perceptual Speed) that were not 
used before with auditory tests but are 
known to measure Gy. Also, two new tests 
were added as markers for Experiential 
Evaluation. 

The auditory battery was slightly different 
from the previous study. All the tests used 
do represent markers for a particular pri- 
mary. Every primary, however, was defined 
before by more than two or three variables, 
and the choice of particular markers here 
would have to change somewhat the nature 
of the primary itself. For example, Auditory 
Induction (Ia) is here defined by two tests 
that previously loaded on Reasoning (R). In 
the previous study, Reasoning clearly in- 
volved both Inductive and Deductive Rea- 
soning, but the markers chosen for this study 
were only those for Inductive Reasoning. 
Similarly, Tempo (T) here was a part of the 
Rhythm factor previously, and Relation 
Perception was previously a part of the 
Memory-Span Relation Perception factor. 

Most of the visual primaries are well 
known and need not be discussed here. The 
auditory primaries are given operational 
definition by their marker tests, "Temporal 
Reordering represents the ability to keep in 
mind the order of presentation of auditory 
stimuli and to recognize these stimuli when 
presented in a different order. Sound Pat- 


tern Recognition represents the ability to 
recognize a pattern consisting of series of 
either tones, chords, or letters. Relation 
Perception is rather heavily loaded with | 
auditory memory, but its basic feature is that 
it involves the ability to perceive the rela- 
tionship among auditory stimuli consisting 
of either tones or meaningful sentences. 
Masking is the ability to hear words spoken 
under conditions of various kinds of inter- 
ference, and it can be interpreted as a fig- 
ure-ground phenomenon. Finally, Tempo 
requires the ability to maintain steady 
tempo either during the silent interval or 
when an interfering beat exists. It is im- 
portant in perceiving rhythm either of a 
musical kind or as it exists in spoken lan- 
guage. ; 

The hypothesized loadings of the pri- 
maries on the second-order factors are also 
indicated in Table 1. It is this pattern that 
is explored here. 


Method 
Subjects 


Subjects (N = 113) were fifth- and sixth-grade pupils 
from a primary school in a relatively homogeneous 
working-class suburb of Belgrade, Yugoslavia. Their 
age was 11 to 12 years at the time of testing in the spring 
of 1973. About 50% were girls. Their native language 


was Serbo-Croatian, a dominant language of Yugosla- ; 
via. 
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Table 2 
Test Statistics 


Test 


. Multiple-Choice Intelligibility 
Cloze 

Verbal Meaning 

. Social Situations 

. Perception of Human Behavior 
. Recognition of Emotions 

. Letter Series 

. Number Series 

. Sequential Tonal Series 

10. Chord Series 

11. Tonal *Gottschaldt" Figures 
12. Number Span (Forward) 

13. Number Span (Backward) 
14. Letter Span 

15. Tonal Reordering 

16. Letter Reordering 

17. Rapid Spelling 

18. Wing's Pitch Change 

19. Seashore's Tonal Memory 
20. Tonal Figures 

21. Memory for Emphasis 

22. Labyrinths 

23. Pursuits 

24. Map Planning 

25. Designs 

26. Hidden Figures 

27. Copying 

28. Identical Pictures 

29. Name Comparison 

30. Picture and Number Comparison 
31. Intellective Masking 

32. Compressed Speech 

33. Traffic-Noise Masking 

34. Seashore's Rhythm 

35. Drake's Rhythm A 


CHAIR NO E 
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No. 

items M SD Tex 
60 44.38 9.94 80 
36 17.11 6.33 .63* 
24 15.49 5.23 75 
14 6.36 1.80 60* 
25 13.96 3.48 52 
16 8.82 2.25 .65* 
20 5.51 3.76 73 
20 7.98 4.37 58 
20 8.05 4.81 58 
19 7.97 6.37 75 
20 9.22 6.31 84 
12 2.76 2.88 .85* 
12 1.08 1.12 ,53* 
12 1.87 1.29 .67* 
20 5.82 3.29 74* 
20 9.71 4.26 62 
20 10.55 3.36 .55 
20 6.36 2.26 .59* 
30 14.39 5.60 -16 
18 6.36 2.27 .50* 
24 8.00 2.63 .49* 
24 11.10 4.37 67* 
48 42.37 8.10 .62* 
20 6.09 2.70 TIY 

200 17.15 8.50 .68* 

15 9.02 2.85 .56 
32 18.53 4.85 -70* 
24 21.89 3.81 STAN 
13 10.13 2.68 J70* 
24 14.54 2.74 .68 
20 15.76 2.45 EL 
70 6.55 5.51 .62* 
20 15.15 2.38 .53* 
10 3.04 1.47 57* 
15 97.50 29.53 TI“ 
11 99.12 17.73 :19* 


36. Drake's Rhythm B 


* These are communalities from the first-order analysis. 


Tests 


Whenever possible, a Yugoslavian version of the test 
was used. Otherwise, instructions were translated from 
English; for some tests, the items themselves were 
translated as well. The list of tests used and their sta- 
tistics are given in Table 2. Short descriptions of each 
test are as follows (asterisks indicate that the test was 
used by Stankov, 1971): á 

* Multiple-Choice Intelligibility Test. Subjects 
were asked to select the spoken word among the four 
phonetically similar words written on a sheet of paper. 
Yugoslavian version was created by the author. Score 
was number correct. 

* Cloze Test. Subjects were to write down two 
missing words from an eight-word sentence read from 
the tape recorder. Yugoslavian version was created by 
the author. Score was number of words that would 
correctly complete the sentence. 


* Verbal Meaning Test. This was the Yugoslavian 
version of the Vocabulary Test. Score was number 
correct. 

* Social Situations Test. Subjects chose among four 
alternatives the one most acceptable way of behaving 
in typical social situations. Score was number cor- 
rect. 

Perception of Human Behavior Test. Yugoslavian 
version of a Social Intelligence Test, for example, “Old 
people usually claim that this world is heading to a di- 
saster.” Subjects had to indicate if the statements were 
likely to be true or false. Score was number correct. 

Recognition of Emotions Test. Yugoslavian version 
of a Social Intelligence Test, for example, “I love him 
so much that I could drink his blood." Subjects had to 
choose among four emotions the one which is expressed 
by the statement. Score was number correct. 

* Letter Series Test. Yugoslavian version (Cyrillic 


328 


'Table 3 i 
Correlations Among Primary Factors 
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Primary factor MES 


1. Verbal Comprehension 69 54 
2. Experiential Evaluation 31 
3. Induction (visual) 

4. Auditory Induction 

5. Memory Span 

6. Temporal Reordering 

7. Sound Pattern Recognition 

8. Relation Perception 

9. Spatial Scanning 

10. Flexibility of Closure 

11. Perceptual Speed 

12. Masking 

. Tempo 


Note. Decimal points are omitted. 


alphabet) was created by the author. Score was number 
correct. 

* Number Series Test. Score was number correct. 

* Sequential Tonal Series Test. Four notes were 
played in such a way as to form a series (ascending, de- 
scending, or other). Then, three alternatives were 
given, and subjects were to indicate which one of them 
represented the continuation of the series. Score was 
number correct. 

* Chord Series Test. Similar to the previous test. 

* Tonal "Gottschaldt" Figures Test. A chord 
composed of three notes was presented. After this, 
three alternatives were given. Each alternative was a 
two-note interval. One of the alternatives involved two 

notes from the first chord. Subjects had to indicate 
this alternative. Score was number correct. 

* Memory Span Tests (Forward and Backward 
Number Span and Letter Span). "Tests from French 
et al. (1963). 

* Tonal Reordering Test. Three notes were played, 
and after a pause, the same three notes were played 
again ina different order. Subjects had to write down 
the order in which the notes were played the second 
time. Score was number correct. 

* Letter Reordering Test. Similar to the previous 
test. 

* Rapid Spelling Test. Subjects had to write down 
words spelled at a rapid pace. Note that since Yugo- 
slavian spelling is phonetical, the task really requires the 
ability to form a whole from the elements; it is not the 
acquired ability to spell as in English. Score was 
number correct. 

* Wing’s Pitch Change Test. Subjects had to decide 
whether two chords were repeated exactly or not. Score 
was number correct. 

* Seashore’s Tonal Memory Test. Pairs of tunes 3 
to 10 notes long were presented. Subjects indicated 
which note had been changed in the second playing. 
Score was number correct. 

* Tonal Figures Test. Subjects were given a set of 
four notes presented in ascending or descending order. 
After this, four alternatives were given. In all the al- 
ternatives, notes were played in the opposite order. 


55 50 28 
38 30 2L 
40 27 


Only one alternative had the same notes as those of thi 
firstset. Subjects had to choose this alternative. Score 
was number correct. 

* Memory for Emphasis Test. Subjects listened to 
a paragraph with certain words markedly emphasi l. 
They were required to identify these words on a writte 
script at the conclusion of the reading. Score v 
number correct. 

Labyrinths, Pursuits, Map Planning, Desig) 
Hidden Figures, Copying, Identical Pictures, Na e 
Comparison, and Picture and Number Compariso 
Tests. All these tests were proper markers for 
primaries from French et al. (1963). They had been in 
use in Yugoslavia for 10 years. | 

* Intellective Masking Test. Subjects were asked 
to select from a list of phonetically similar words a word 
heard against an increasingly loud background of & 
second continuous speaker. Score was number cor: 
rect. 

* Compressed Speech Test. Sentences were re- 
recorded with an increased speed of tape moveme! 
Subjects were to write down the sentences. Score was 
number of words correct. 

* Traffic-Noise Masking Test. Similar to the In 
tellective Masking Test. 

* Seashore's Rhythm Test. Subjects compared two 
rhythmic patterns to judge them the same or differen! 
Score was number correct. 

* Drake’s Rhythm A Test. Subjects were to continue 
to count a beat established by a metronome during si- 


lence until told to stop. Score was number of beats 
different from norm. 


Procedure 


All tests were recorded and presented by a tape re- 
corder. For visual tests, the time allowed was checked 
bya silent interval on a moving tape. Before the sta 

of a testing session, subjects were given answer sheets 
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Table 4 
Promax-Rotated (Pattern Values) Unrestricted Maximum-Likelihood Factor Analysis Solution 


for the Matrix of Table 3 


Factor 
1 2 3 4 

Primary factor (Ge) (Gf) (Gv?) (Ga?) h? 
Verbal Comprehension 1.01 1.00 
Experiential Evaluation 418 ‘50 
Induction (visual) .33 .30 39 
Auditory Induction .60 43 
Memory Span .24 33 82 
Temporal Reordering 95 71 
Sound Pattern Recognition 81 255. 
Relation Perception 57 40 
Spatial Scanning 31 .32 44 
Flexibility of Closure 1.02 1.00 
Perceptual Speed 45 26 46 
Masking 32 27 40 
Tempo 1.03 1.00 


Note. Gc represents the factor of Crystallized Intelligence; Gf, the factor of Fluid Intelligence; Gy, the factor of General Visu- 


alization; and Ga, the General Auditory Function. 


(and actual tests for the visual part of the battery). 
They were told that most of these tests had never been 
given to children and that the purpose of testing was to 
check if children could understand and do the problems. 
It was made clear to them that school officials would not 
be given the actual results of their performance. This 
information was sufficient to ensure good rapport. 

The battery required 4 school hours, administered to 
each class on 2 consecutive days (2 hours each day). 
There were three fifth-grade and three sixth-grade 
classes. Special care was taken to prevent intentional 
or unintentional copying. During testing, two adults 
(not teachers) were present in the classroom. 


Statistical Analyses 


Two kinds of analyses were performed. First, the 20 
auditory tests alone and then the full 36-test battery 
were factor analyzed using the principal-components 
procedure. These results are not presented here be- 
cause the ratio of subjects to variables was unfavorable. 
They served only to obtain communality estimates for 
those variables for which reliability estimates were not 
available. These communality estimates are given in 
Table 2. 

Inspection of test means and standard deviations, 
relative to the total number of items, shows that the 
majority of tests have an adequate level of difficulty, but 
some appear too difficult and others too easy. Also, 
reliability estimates appear lower than those typically 
reported for standardized tests, though not considerably 
lower than what other investigators (e.g, Guilford, 1967) 
report with similar measuring instruments. To over- 

' come these problems, z scores were computed for every 
test and summed in sets like those in Table 1 to form 
composite scores for 13 primaries. The main interest 
of the present study was in checking the second-order 
structure. 


The correlational matrix of composite scores, shown 
in Table 3, was analyzed using unrestricted maxi- 
mum-likelihood factor analysis (Joreskog, 1967).! This 
method was recommended by McDonald (1974) as a 
theoretically adequate way of avoiding the problem of 
indeterminacy in common factor analysis. Velicer 
(1976) reports that with several empirical correlational 
matrices, it produced solutions very similar to those of 
principal-components and image analyses. With this 
method, the obtained solution is scale free, and a sta- 
tistically based test for the number of factors is avail- 
able. Unrestricted maximum-likelihood factor analysis 
also incorporates a solution for the Heywood case (im- 
proper solution), which can occur with this method. If 
an improper solution obtains for some variables, say 2 
out of 13, these are expressed as principal components 
and partialed out of the correlation matrix R. The new 
11 X 11 matrix is then analyzed using maximum likeli- 
hood, and k — 2 factors are extracted (k being the 
number of factors desired). The final solution consists 
of combined principal-components vectors and k — 2 
vectors from the partial matrix. Communalities for 
variables causing the improper solution cannot be larger 
than one. 

The number of factors was determined by a chi- 
square goodness-of-fit test. For the four-factor solution, 
achi-square test with 35 degrees of freedom was 39.44. 
"This value has a probability level of .28, indicating ac- 


1 The correlational matrix of Table S yu it 
l; by principal components, principal factors, Little 
poss Mark IV (Kaiser & Rice, 1974), and Alpha-KD 
(Kaiser & Case, Note 1). Unrestricted maximum- 
likelihood factor analysis provided a solution that 
closely agrees with the interpretation that could be 
achieved by considering all these solutions together and 
so was chosen here. 
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ceptable fit; four factors can be used for interpretation. 
The matrix obtained from the unrestricted maxi- 
mum-likelihood factor analysis was first rotated by 
varimax and then by promax (power set to 5). Table 
4 contains all salient loadings (those above .20) from 
promax. 


Results 


Attending first to communalities, it can 
be seen that the variables with the lowest 
communalities are Induction (visual) and 
Memory Span. These are postulated mea- 
sures of Gf; if they indeed measure the same 
factor, their loadings on it would be low. It 
is also apparent that three variables (Verbal 
Comprehension, Flexibility of Closure, and 
Tempo) cause an improper unrestricted 
maximum-likelihood factor analysis solu- 
tion. In itself, this may not be reason for 
concern, but if these variables show high 
loading on one factor when other variables 
have low loadings, there would be strong 
indication that this factor is a specific one. 
Inspection of Table 4 shows that the factor 
on which Verbal Comprehension has the 
highest loading (Factor 1) also has high 
loadings on several other factors, but the 
same does not hold for Tempo and Flexi- 
bility of Closure. It can be argued that these 
two factors should be dropped and the whole 
analysis should be repeated without them. 
This was not done with the solution of Table 
4 because (a) Tempo appears to share some 
common variance with Masking, and Flexi- 
bility of Closure shares some variance with 
Spatial Scanning; (b) Tempo was a well- 
defined factor in the first-order analysis, and 
Flexibility of Closure is an established pri- 
mary factor within this age group (French et 
al., 1963); (c) some other solutions with the 
matrix of Table 3 (e.g., principal compo- 
nents) indicate that Factors 3 and 4 might be 
broader than those produced by unrestricted 
maximum-likelihood factor analysis. 
Although Tempo and Flexibility of Clo- 
sure were retained, it is important to observe 
caution in interpreting Factors 3 and 4, since 
clearly, they are not as broad as one would 
hope. The following discussion assumes 
that the data provide evidence for Factors 1 
and 2 and that the other two factors are im- 
plied but not definitely established. 
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Factor 1: Crystallized Intelligence (Gc 


Two variables with high loadings on th 


periential Evaluation. 
ered to be good markers of crystallize 
intelligence, since both of them imply th 
effect of acculturation. Several other vat 
ables also have noteworthy loadings on th 
factor, the highest being Perceptual Speer 
This primary is typically found to be a goo 
marker for Gv (and also Broad Speedines 
or Gs), and it is necessary to explain its al 
pearance here. Three tests were used 
define this primary: Identical Picture 
Number and Picture Comparison (a subte 
from the Beta test), and Name Compariso 
The last-mentioned test involved compar 
son of long Yugoslavian names, and it can b 
considered to be a measure of learning a 
acculturation in this age group. In othe 
words, it can be claimed that spelling has ni 
been mastered within this age group; ther 
fore, this test does not measure pure Pel 
ceptual Speed and should be an indicant 
Gc as well. Also, the Perceptual Spee 
loading may reflect the fact that part of G 
is measured by visual tests. ! 
We should keep in mind that Gf-G 
theory accepts the possibility of overla 
among the factors and allows that one vati 
able can measure two (or more) factor 
This would then explain Gc loadings 0 
other variables as well. For example, Spati 
Scanning also loads this factor, indicati r 
again that part of Gc is measured throug! 
the visual modality. Masking would im 
plicate auditory input too, and fina 
Memory Span and Induction point to 
well-known overlap between Gf and 
Note, however, that all the other variable! 
besides Verbal Comprehension and Exper 
iential Evaluation do not have high load 
ings on this factor, and therefore, it can b 
concluded that this is a broad Gc factor. 


Factor2: Fluid Intelligence (Gf) 
Measured Through Auditory Modality 


Temporal Reordering, Sound Patteri 
Recognition, Auditory Induction, and 
Relation Perception show high loadings oF 


this factor. Somewhat lower, but still 
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noteworthy, are loadings of Memory Span, 
Induction, and Perceptual Speed. This is 
obviously a rather broad factor and its in- 
terpretation is not easy. Most primaries of 
this factor, and in particular those with high 
loadings, are new auditory primaries that 
have been insufficiently explored. There are 
several possibilities. 

It may be argued that since music is 
taught in schools and some of the tests in- 
volve tones, this is another Ge factor re- 
flecting specific acculturation influence. 
That would be an unlikely conclusion be- 
cause not all primaries of this factor involve 
musical stimuli, and even for those that do, 
it is hard to see why performance on them 
would depend on musical training. 

Another possible hypothesis may be that 
this is a General Auditory Function (Ga). 
This is probably not the case because all the 
primaries involved require relatively com- 
plex manipulation of stimulus input, not 
merely perception. Also, only one of the two 
“pure” markers for Ga (Masking) has a low 
salient loading. In addition, it can be seen 
that two primaries (Perceptual Speed and 
Induction) are visual. While visual and 
auditory processes may correlate, such cor- 
relation is likely to be lower between the 
perceptual than between “higher” mental 
processes (Stankov, 1971). In other words, 
it is more probable that a factor defined by 
both auditory and visual primaries repre- 
sents higher processes than those typically 
implied by the word perception and by 
Ga. 

Among the other possibilities that should 
be considered are (a) short-term acquisition 
and retrieval function (SAR), (b) interme- 
diate memory, and (c) fluid intelligence (Gf). 
All these are more likely candidates than 
either Ga or Ge; indeed, in the earlier writ- 
ings of Horn, they were all treated as one 
broad factor, that of fluid intelligence. SAR, 
which is to some extent similar to Jensen’s 
Level I ability, is measured by tests of im- 
mediate and associative memory. Since 
Memory Span has a relatively low loading on 
this factor and no primary represents a clear 
measure of associative memory, it would be 
inappropriate to identify it as such, but a 
certain similarity should be acknowledged. 
Recently, Horn (1975) indicated that in the 
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life-span development of intelligence, an- 
other broad factor corresponding to the 
process of intermediate memory (memory 
for events lasting more than 30 sec but less 
than 1 hour) can be detected. He quotes 
some of the tests and primaries of Factor 2 
as appropriate measures of it. On the other 
hand, there is as yet little evidence showing 
clearly the distinction between intermediate 
memory and other broad factors. 

Working with adults, Stankov (1971) 
found that a majority of tests used to mea- 
sure primaries of the present Factor 2 define 
a second-order factor identified as Gf. It is 
true that the highest loadings in the 1971 
study were those of Memory-Span tests, and 
possibly a reinterpretation of these data 
along the lines of a new SAR factor should be 
attempted. Also, while the loading of In- 
duction on Factor 2 is relatively low, its 
presence along with Memory Span suggests, 
at least in part, the processes called fluid 
intelligence. Factor 2 is therefore tenta- 
tively called fluid intelligence measured 
through auditory modality, with the un- 
derstanding that it also embraces processes 
of SAR and intermediate memory. 


Factor 3: General Visualization (Gv)? 


As already noted, the last two factors are 
rather narrow: Only two primaries have 
salient loadings on them. Factor 3 is pre- 
dominantly Flexibility of Closure with a 
much lower loading of Spatial Scanning. 
While Perceptual Speed loads Gc and Gf, it 
is worth noting that in one of the other so- 
lutions attempted with this correlational 
matrix (Alpha-KD; Kaiser & Case, Note 1), 
it did load on this factor. 

Undheim (1976) reported a broad Visu- 
alization factor among the 10 to 11 year olds 
in Norway, and present results are obviously 
in poor agreement with his. But Undheim's 
broad Visualization factor might not be 
broader than the one we have here. He had 
postulated three visualization primaries: (a) 
Spatial Relation (S), measured by two tests 
that produced one factor at the first order; 
(b) Speed of Closure (Cs), measured by 
parallel forms of the same test (producing a 
doublet at the first order); and (c) Visual- 
ization primary (Vz), measured by three 
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Table 5 

Factor Intercorrelations 

Factor 
Factor Ge Gf Gv Ga 

Gc 1.00 63 51 40 
Gf 63 1.00 .50 EU 
Gv 51 .50 1.00 25 
Ga .40 40 .25 1.00 


Note. Gc represents the factor of Crystallized Intelligence; Gf, 
the factor of Fluid Intelligence; Gv, the factor of General Vi- 
sualization; and Ga, the General Auditory Function. 


tests that did not load the same factor at the 
first order. Because of this outcome at the 
first order, he included S and Cs composites 
and all three Vz tests in his analysis at the 
second order. His low loading (.22) for Cs on 
General Visualization is analogous here to 
what happened with Perceptual Speed. The 
highest loading on Gv was .60 for Spatial 
Relations, whereas all three Visualization 
tests had loadings in the .40s. If the three 
Visualization tests had been combined into 
one Vz composite, his result might look 
similar to the present solution. The breadth 
of his Gv factor depends on the inclusion of 
separate Vz tests in the second-order anal- 
ysis. 


Factor 4: General Auditory Function 
(Ga)? 


Like Gv, this factor is mainly Tempo with 
a low loading of Masking. Ga and Gv show 
the lowest correlation among all factor in- 
tercorrelations of Table 5. This agrees with 
the expectation that perceptual factors 
should have low correlations among them- 
selves. 


Discussion 


In interpreting results, one should keep in 
mind that instructions for all tests, and some 
tests themselves, were given in a different 
language from that in which they were 
standardized, and some tests might be 
poorly adapted for this age group. 

The results, nonetheless, indicate two of 
the four hypothesized factors. One of these 
represents crystallized intelligence; the other 
was tentatively labeled as fluid intelligence 
measured with auditory tests. The two 
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perceptual factors are less well establishe 
so their occurrence here can be viewed wi 
suspicion. Such an outcome might occur) 
some factors were not sufficiently overdi 
termined. With primaries based on com 
posites derived to a larger extent from th 
first-order analysis, better overdetermin: 
tion could have been achieved. Even wit 
this reservation, the present results desery 
consideration with respect to the age-dii 
ferentation hypothesis. 

With this age group, one broad facto 
should reflect acculturational influences 
(especially in a school'system that allows liti 
specialization at this age level). A broadG 
factor did emerge here. It was also hj 
pothesized that if Gf appeared in addition f 
Gc, their correlation should be higher tha 
what is typically obtained with adults (b 
tween.20and.50). As seen in Table 5, thi 
correlation (.63) is close to Undheim's (1976 
result of 64. He also obtained comparabl 
correlations between Gv, Gc, and Gf. Onth 
other hand, his fourth factor (broad Spee 
iness) had higher correlations with the fir 
three factors than Ga had in the presen 
study. 

Some differentiation of abilities seems! 
have taken place, and although a clear G 
factor did occur, some additional commol 
variance also exists. The reason for sept 
ration between Gf andGc was attributed b 
Undheim to motivational and other non 
cognitive factors in addition to age, but nel 
ther the present data nor Undheim's provid 
a direct check of this. 

A broad Gv factor has commonly beel 
observed among adults, in addition to Gf ant 
Gc. Comalli (1970) reviewed data on visud 
illusions, spatial orientation, part—wholl 
differentiation, perceptual closure, am 
speed of recognition. Most of these vari 
ables show progressive changes durin| 
childhood until, at the ages between 14 ant 
20 years, performance is typically similar t€ 
that of adults. Since development of G 
must be related to these changes, it is nol 
surprising to find that at the age of 11 to L 
years, its showing appears poor and onlj 
implied. 

General Auditory Function, as the studié 
with adults show, correlates with audiomete 
measures, a finding that implies deteriora 


FLUID AND CRYSTALLIZED INTELLIGENCE 


tion of hearing acuity. Since this should not 
be a factor during childhood years on tests 
using typical speech sounds (e.g., Masking), 
perhaps a well-established Ga factor should 
not be expected. The present results indi- 
cate also that some individual differences 
occur with respect to Tempo (Rhythm), 
which while not directly related to hearing 
acuity has been found to measure Ga in 
adults. 

General Auditory Function may be poorly 
defined here because the auditory domain 
has not been sufficiently explored; the au- 
ditory primaries are then less well estab- 
lished than the visual ones. The inclusion 
of other primaries might define Ga more 
firmly. 

Another possibility is that in this study, 
the primaries themselves have not been 
replicated. The assumption that the pri- 
maries here are the same as in Stankov 
(1971) may not be justified. The first-order 
analysis, although performed, has not been 
reported here. It showed, however, that 
Tempo, Auditory Induction, Temporal 
Reordering, and Masking do appear at the 
first order. 

The fact that Gf and Gc have been repli- 
cated with auditory tests is of considerable 
importance of temporal integration as part 
two board factors. Stankov (1971) elaborated 
on several characteristics of auditory tests 
that bring into focus some features of intel- 
lectual functioning not easily accessed by 
ordinary visual tests. For example, se- 
quential presentation of stimuli, which is 
typical of auditory material, stresses the 
importance of temporal integration as part 
of intelligence. The paced and controlled 
nature of presentation as compared to rela- 
tively free work through the visual tests may 
cast a new light on the rolé of intellectual 
speed. This and also the uncertainty asso- 
ciated with the need to keep in mind several 
stimuli spread over time may contribute to 
our understanding of the role of attentive- 
ness and of the relationship with other per- 
sonality variables such as risk taking. On 
the practical side, measurements of Gf and 
Gc through the auditory modality opens up 
the possibility of assessing the capabilities 
of various disadvantaged groups (e.g., the 
blind). 
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The two perceptual factors (Gv and Ga) 
represent cognitive processes that seem 
relatively less complex than those required 
by intelligence tests. They are dependent, 
according to the theory, on the functioning 
of peripheral organs, afferent pathways, and 
projection areas of the visual and auditory 
cortex. Bock (1973) indicates that the left 
cerebral hemisphere may be responsible for 
the processing of Gc and some aspects of Ga. 
Both Ga and Gc seem to be restricted 


somewhat (possibly.artificially) in the scope 


of material used to assess them. Gv seems 
to emphasize pictorial material (excluding 
color and other dimensions of visual per- 
ception); Ga seems to be restricted to spoken 
words, time perception, and tones varying in 
pitch (excluding loudness, space localization, 
etc.). Finally, Ga appears to be closely linked 
to measures of auditory acuity, which seems 
not to be the case for Gv. 


Reference Note 


1. Kaiser, H. F., & Case, D. Computer algorithm for 
Alpha-KD (mimeographed paper). Berkeley: 
University of California, Department of Education, 
1975. 
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Relationship of Student Self-concept and Selected Personal 
Variables to Participation in School Activities 


Joseph S. Yarworth and William J. Gauthier, Jr. 
Bucknell University 


The relationship between various aspects of student self-concept and student 
participation in the extra- and cocurricular activity programs of several Penn- 
sylvania high schools was explored. The study was a marked departure from 
previous studies because it combined psychological with personal variables in 
its examination of student participation. Known and hypothesized indicants 
of participation were explored. Results indicated that self-concept variables 
as well as personal variables were differential in the nature of their contribu- 
tions to different activity classifications. 


This study explored the relationship be- 
tween various aspects of self-concept and 
student participation in the extra- and co- 
curricular activity programs of several cen- 
tral Pennsylvania high schools. Because it 
combined psychological variables with per- 
sonal variables in its examination of student 
participation, the present study was a 
marked departure from previous studies. It 
explored both known and hypothesized in- 
dicants of participation. ° 

Unlike their European counterparts, 
American schools have stressed extra- and 
cocurricular activities as an integral part of 
the school program (Graham, 1964). 
Whether these activities, athletic and 


nonathletic alike, have stressed the ideals of. 


mutual cooperation, guidance, practice for 
social development or arousal of interest in 
the community (Yon, 1963), they have been 
incorporated into what Frederick (1959) 
described as the “third curriculum.” 
Despite some recognition of the students’ 
desire to participate for reasons of their own 
(e.g., the desire to accommodate the perfor- 
mance expectations of their parents as well 
as their classmates), researchers who have 
examined student activity programs have 
failed to examine the psychological basis for 
adolescent participation. Research in the 
field has been guided by a belief that socio- 
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logical reasons have been the major factors 
that have influenced students’ extra- and 
coċurricular activities. 

The basic rationale for undertaking a 
study of the interaction between psycho- 
logical variables and participation in extra- 
and cocurricular activities may be found in 
Rehberg’s (1969) article dealing with student 
participation in interscholastic sports. After 
reviewing the «literature that compared 


- participation with educational goals and 


achievement in academics, Rehberg (1969) 
indicated, 


It is plausible that some or all of the apparent positive 
association between educational pursuits and athletics 
results from joint association of each of these variables 
[academic achievement, goal aspiration, and partici- 
pation in sports] not with each other but with one or 
more antecedent variables [those that were temporally 
consequent to participation or nonparticipation and 
temporally antecedent to grades or educational expec- 
tations). (p.75) 2 


One such antecedent variable suggested by 
Rehberg was self-esteem, which he defined 
as a “positive self-image.” The belief that 
self-concept variables may be antecedent to 
other variables has-been supported recently 
by Shavelson, Hubner, and Stanton 


(1976): 


-concept, then, whether used as an outcome itself 
bp a AIA SA variable [Rehberg’s antecedent 
variables] that helps explain achievement outcomes, is 
a critical variable in education and in educational 
evaluation and research. 


Copyright 1978 by the American Psychological Association, Inc. 0022-0663/78/7003-0335$00.75, 
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Taken individually they [self-concept studies] often 
provide important insights into the factors that moti- 
vate students in and out of school and into alternate 
courses of action that may enhance students’ self-con- 
cepts. (p.408) 


Rehberg, however, failed to pursue this 
hypothesis concerning antecedent variables 
as they relate to nonathletic experiences of 
the school activity program. Yet, Rehberg’s 
recognition that psychological constructs 
may be an important impetus for participa- 
tion indicated the necessity of conducting 
research in this area. 


Review of Literature and Related 
Research 


The pattern for the sociologically based 
research into school activities was estab- 
lished in the late 1940s with the publication 
of Havinghurst and Taba’s (1949) text on 
adolescent behavior. The authors indicated 
that 


to achieve success in adolescent peer culture a boy or 
girl must stay in school, be a reasonably good student, 
[and] take part in school activities. In the process of 
adjusting in those ways, he would be nearing middle- 
class morality. (Havinghurst & Taba, 1949, p. 36) 


Until recently, the major studies of ado- 
lescent behavior (Coleman, 1961; Hollings- 
head, 1949; Mussen, Conger, & Kagan, 1966; 
Stendler, 1949; Taba, 1955) stressed social 
and economie factors while almost com- 
pletely ignoring any psychological contri- 
butions that might show some influence on 
participation in the school activity pro- 
grams. 

During the 1950s and 1960s, researchers 
suggested that participation was correlated 
with academic achievement as reflected by 
the student's grade point average. Freder- 
ick (1959) in his text, The Third Curriculum, 
reviewed the literature in this area, which 
showed that students who were most active 
in the activity programs tended to receive 
the highest grades in academic studies. 

Following in this tradition, researchers 
(Bell, 1967; Bourgon, 1967; Coleman, 1961; 
Eidsmore, 1963; Koenig, 1969; McCray, 1967; 
Milliren, 1974; Nichols et al., 1973; Polk & 
Halferty, 1972; Schafer, 1972; Schafer, Oxela, 
& Polk, 1972; Schendel, 1965; Smith, 1964; 
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Spady, 1971; Wertz, 1965) found significan 
relationships between participating in bot] 
athletic and nonathletic activities and aca: 
demic achievement, curriculum trac 
membership, post high school career aspi 
rations, and stable personality characteris: 
tics. This research ties directly to Super! 
(1957; Super, Starishevsky, Maflin, & Jord- 
an, 1963) belief that vocational self-concept: 
(defined by membership in certain careel 
paths through school) are part of the general 
self-concept that students possess; henc 
one should expect a correlation betweei 
self-concept, curriculum track membership; 
academic achievement, and school activity 
participation. 

Although Ludwig and Maehr (1967) con- 
cluded that success and failure in athletic 
motor tasks changed students’ self-concept 
of their physical abilities (but not theii 
self-concepts in general), no studies excep! 
that conducted by Phillips (1969) are re: 
ported for the investigation of self-concep 
and full activity program participation. 

Using the Osgood semantic differential, 8 
instrument composed of pairs of adjective 
arranged on a scale ranging from 1 to 7 oné 
continuum, and an activities checklist, 
Phillips (1969) studied 199 students in 
Michigan high school to determine the rela: 
tionship between participation in activitie 
and self-concept. He discovered that par: 
ticipation in the activity program was si 
nificantly related to self-concept scores fol 
boys but not for girls nor for the total 
sample. No significant relationship wa 
found for girls between participation in an} 
activity and scores on the self-concept 


An initial conjecture of the Phillips’s stud} 
was that nonparticipating students would 
have lower self-concept scores. The data 
provided by Phillips showed no evidence ol 
support for this hypothesis. He concluded 
that variables other than the activity pro 
gram were instrumental in the development 
of self-concept; unfortunately, he failed to 
determine what these variables were. Th 
results reported in the present study bring 
t number of Phillips's findings into ques 
ion. 

Despite the ever-increasing literat 

dealing with self-concept, few studies havi 
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examined the direct relationship between 
self-concept and participation in various 
school activities. One reason for this rests 
in the difficulties that attend self-concept 
research, 

As Wylie (1961, 1974) and Crowne and 
Stephens (1968) indicate, research in the 
area must rely on self-reports, the generality 
of the concept of self-acceptance, the use of 
socially desirable answers by respondents, 
the absence of clear construct-level defini- 
tions, the unsupported assumption of 
equivalence of assessment procedures, the 
failure to construct tests in accord with 
principles of representative sampling, and 
the problem of division between those who 
view self-concept studies phenomenologi- 
ay and those who view them behavioral- 
y. 

Yet, despite these problems with self- 
concept studies, as Wylie (1961, 1974) 
maintains, if one does not use self-report 
instruments (acknowledging the weaknesses 
described above), there is no way of knowing 
whether self-concept exists; and subse- 
quently, one cannot begin to measure it. 
This position has been supported by Sha- 
velson et al. (1976, p. 411), who comment 
that “Self-concept is inferred from a person’s 
responses to situations” and “Self-concept 
is restricted to a person’s report of self.” 
Shavelson et al. (1976, p. 435) also acknowl- 
edge the weaknesses of self-concept 
pointed out by Wylie (1961, 1974) and 
Crowne and Stephens (1968): 


As a body of research, self-concept studies lack a focus 
that would result from an agreed upon definition of 
self-concept, lack adequate validation of interpretations 
of self-concept measures, and lack empirical data on the 
equivalence of the many self-concept measures cur- 
rently being used. 


These caveats should be kept in mind when 
» reading any study of self-concepts. 


Study Design 


This study was an ex post facto field study 
that examined the main and interactive re- 
lationships among five independent vari- 
ables and three dependent variables. The 
five independent variables were self-concept, 

. membership in a specific high school cur- 
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riculum track, academic achievement, sexual 
classification, and grade classification. The 
dependent variables were participation in 
the total school activity program, subdivided 
into participation in the school athletic ac- 
tivity program and participation in the 
nonathletic activity program. The general 
hypotheses expressed in substantive terms 
were that the five independent variables 
would be significantly correlated with each 
of the dependent variables. 

In addition to a total self-concept score, 10 
subconstructs of self-concept were opera- 
tionalized by using the counseling version of 
the Tennessee Self-Concept Scale (TSCS; 
Fitts, 1965). Norms have been developed by 
standardizing the instrument across age, sex, 
race, intelligence, education, and economic 
levels, with the foregoing factors making 
negligible contributions to the variance in 
scale scores (Fitts, 1965). In the last 5 years, 
the TSCS has been employed in several ed- 
ucational studies that have examined cor- 
relations of self-concept with such factors as 
leadership, the effects of counseling on 
self-esteem, improving the aspiration level 
of disadvantaged students, intelligence and 
creativity, values clarification, and school 
dropouts (Kwal & Fleshler, 1973; MacKeen 
& Herman, 1974; McCormick & Williams, 
1974; Milgram & Milgram, 1976; Ohlde & 
Vinitsky, 1976; Thompson, 1972; Thornburg, 
1974). A Kuder-Richardson reliability 
coefficient of .91 and a test-retest reliability 
coefficient of .92 have been reported (Rad- 
ford, Thompson, & Fitts, 1971). A number 
of factor analytic studies have generally 
supported both content and construct va- 
lidity of the TSCS (Gable, LaSalle, & Cook, 
1973; Radford et al., 1971). 

In addition to a total positive self-concept 
score, the following subscales were calcu- 
lated: (a) Physical Self—individuals’ view 
of their body, health, physical appearance, 
and sexuality; (b) Moral-Ethical Self— 
individuals’ estimation of their moral worth; 
Self—individuals’ sense of 
personal self-worth and adequacy; (d) 
Family Self—feelings of worth as a family 
member; (e) Social Self—individuals’ sense 
of worth in social interactions; (f) Iden- 
tity—individuals’ estimate of their basic 
identity; (g) Self-satisfaction—individuals 
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degree of self-acceptance; (h) Behavioral 
Self—individuals’ perception of their own 
behavior; (i) Self-criticism—individuals’ 
willingness to admit that mildly derogatory 
statements are true about themselves; (j) 
Instability—amount of inconsistency from 
one area of perception to another. 

‘The independent variable of membership 
in a specific high school curriculum track was 
determined by examining each student's 
schedule. Each student was then assigned 
to a particular track (i.e., college prep, busi- 
ness, general, and vocational-technical) on 
the basis of the major subjects carried during 
the semester when the research was con- 
ducted, 

The academic achievement variable was 


based on grade point average. Each stu- 
oa rank was converted to a percentile 
rating. 
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subsequent-order interactions were analyzed thro 
the use of stepwise linear multiple regression, 


analyzed, j H 
Although a problem of multicollinearity among t 
subscales of the TCSC exists, Fitts h 


tically, the ition of Kerlinger (1973, p. 624] 
involving the calculation of squared semipartial corne 
lation coefficients, was utilized in the multiple re 


ous measures of self-concept were relative 
independent of the variance contributed by 
the other independent variables (see Tab 
1). For example, the Social Self subs 
correlated highly with the other measures 0 
self-concept (identity, r = .70; moral self, f 
= .54), but there were low correlations b 
tween the social self and the other (non 
self-concept) independent variables in 

correlation matrix (track, r = .16; grade,r 
08; rank, r = .16; sex, r = 07). The sam 
relationship held for the other self-concep 
subscales; therefore, each regression equi 
tion provides a psychological portrait 0 
those students who participate across tli 
three classifications of the school activit 


tionship, that is, different areas of aspects 0 
self-concept related in different magnitud 
to different types of participation in th 

activity program. 

regression equations indicate which 
Peery of self-concept contributed to dif: 
erent types of participation: For total ač 
tivity Participation, the social self, identiti 
and the moral-ethical self were contributor 
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Table 2 
Tennessee Self-Concept Scale and Total 
Activity 
Increase 
inR 
Step — Variable R* P per step 
1 Track 5192 — 168.654 —— 
2  Socalself 5442 0250 
3 Grade 5608 69.585 0166 
4 Rank 5788 57.184 0180 
5 Identity -5802 45.982 4014 
6 Moral self -5830 38.794 0028 
7 Sex 5831 33.197 0001 
* All Re are significant at the 01 level. 


tution mte REDDE d nel 
can. noted reviewing re- 
equations (see Tables 2, 3, and 4). 
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and was added to the table for the nonath. ; 
letic scores. In the same manner, the social 
self reached correlational significance at the 
.05 level only with the total activity score, 
yet, when considered with the other inde. 
pendent variables, it reached significance 
with all three dependent variables. 

"The differences between the correlations 
of an independent variable with two depen. 
dent variables (athletic and nonathletie 
scores) was also significant at the .05 level, 
For example, the physical self correlated at 
:20 with the athletic activity score and at .03 
with the nonathletic score. Submitting 
these correlations to the ¢ statistic designed 
to determine the significance of differences 
between “dependent” correlations, that is, 

ions taken from the same population, 
the ¢ statistic for the difference between 
these correlations was 2.82 (a t of 1.96 was 
necessary for significance at the .05 level and 
2.576 for the .01 level), which indicated the 
strength of the differential contribution of 
the physical self to athletic and nonathletit, 
participation. 

When submitting the correlation differ- 
ences between the independent variables 
and the athletic and nonathletic scores, the 
following information can be reported: 

1. For track, the correlational difference 

n athletic and nonathletic participa: 
tion was .13, which yielded a t of 2.409 (sig- 
nificant at the .05 level). This can be ex- 

i 


Table 3 


Interaction of the Inde; 7 d 
the pont T nek pendent Variables an 


Increase 
ink 


og 4253 540988 | —— ; 
3 Ph 4270 50.848 1017 
self 
: A487 38233 0217 
self 4559 29772 0072 
("TEE 4582 24.072 0023 
. 20004 — 000 
7 Niirird 
8 P 4588 17.180 .0002 


“All Re are significant at the .01 level, l 
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pected because track membership accounted 
* for a smaller amount of variance in the ath- 
letic activity regression equation. 

2. For sex, the correlational difference was 
.52, which yielded a t of 9.18 (significant at 
the .01 level). This can also be ex 
because boys generally participated in ath- 
letic activities, while girls usually selected 
nonathletic activities. 

3. For academic achievement (rank), the 
difference in the correlations between ath- 
letic and nonathletic participation was .21, 
which yielded a t of 3.73 (significant at the 

1.01 level). This was expected because 
nonathletic activities usually require high 
grade point averages for participation and 
many of the nonathletic activities (e.g., band 
and chorus) were populated by that type of 
student Coleman (1961) would call members 
of the high school “leading crowd,” that is, 
students (especially girls) who were college 
prep, good students, and participants in the 
social subgroups of the high school, Often 
participation in team sports activities did not 
demand high grade point average require- 
ments. 

4. In regard to the self-concept, the sig- 
nificant physical self correlational difference 
(see above) can be expected, since partici- 
pation in nonathletic activities would not 
require the satisfaction of one's physical 
development, which would be required for 
participation in the physically more de- 
manding athletic activities. In addition, the 
personal self exhibited a correlational dif- 
ference of .14, which gave a t of 2.32 (signif- 
icant at the .05 level), and the Self-satisfac- 
tion subscale showed a correlational differ- 
ence of .14 (a significant t of 1.42 at the .06 
level), both of which show the differential 
nature of the influence of self-concept on 
participation in the school activity pro- 
gram. 

5. The differential nature of the contri- 
potion of independent variables — 
ited to self-concept measures; on 4 
trary, it can be extended to the independent 
variables of track, academic achievement, 
and sexual classification. The difference for 
pde classification was not significant at any 

el. : 

This study did not support the findings of 

the Phillips's (1969) study vis-à-vis the re- 


MI 


"Table 4 
Interaction of the Independent Variables and 


the Nonathletic Activity Score 


Increase 
ink 
Step — Variable Lu LÀ per step 
1 Tec Ame — 1654 -— 
7 hex 442 uk] (OAM 
3 Grade Me a ILILI 
4 Rank A55 bio — we 
5 Socisisif Sans LI Ll 
6 = Total 
positive S858 39348. 0084 
7 Moral 
self Lud 33.9% A019 


(n = 173). The calculated 
(significant at the .01 level). This in- 
S difference doss 


dE 
1 
if 
B 
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students who obtained a score of 30 or less 
(participated in less than one major sport) 
were classified as low-frequency participa- 
tors. The mean of the total positive self- 
concept score for high-frequency participa- 
tors was 350.684 (n = 76); the mean for the 
total positive self-concept score for low-fre- 
quency participators was 332.799 (n = 289). 
The calculated t statistic was 4.26. Again, 
the significant difference at the .01 level in 
the total self-concept results were contra- 
dictory to those obtained by Phillips. 
For the nonathletic activity score, stu- 
dents who obtained a score of 80 or higher 
(participated in at least two major activities) 
were classified as high-frequency partici- 
pators; students who obtained a score of 30 
or less (participated in less than one major 
activity) were classified as low-frequency 
participators. The mean for the total posi- 
tive self-concept score for high-frequency 
participators was 345.494 (n = 91); the mean 
for the total positive self-concept score for 
low-frequency participators was 335.003 (n 
= 268). The calculated t statistic was 2.52 
(p significant at the .05 level). This signif- 
icant mean difference would not support 
Phillips's statements concerning the lack of 
differences in self-concept for high- and 
low-frequency participators. 

Clearly, these statistics indicate that not 
only was there a difference in the self-con- 
cept scores of high- and low-frequency par- 
ticipators, but that this difference is signif- 
icant for all three categories of student ac- 
tivities. The findings lead us to question the 
results of Phillips’s study. 

Large numbers of students do not partic- 

ipate in the extra- and cocurricular activities 
of the school: For the total activity score of 
30 or less (nonparticipation in only one major 
activity), there were 173 students, 37.6% of 
the total sample. When the figures were 
then broken down into athletic and 
nonathletic activities, the rate of nonparti- 
cipation soared: For athletic activity scores 
of 30 or less (nonparticipation in one major 
sport), the student number was 289, 62.9% 
of the total sample; for nonathletic scores of 
30 or less (nonparticipation in one major 
activity), there were 268 students, 58.396 of 
the totalsample. Allthree figures illustrate 
low participation rates for the sample. 
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Conclusions 


Perhaps the most important finding of the 
study was the light shed on the Rehberg 
(1969) hypothesis. Although the present 
study did not examine self-concept as an 
intermediate variable between academic 
achievement and participation in inter- 
scholastic sports, it did establish the im- 
portance of self-concept in the relationship 
between academic achievement and partic- 
ipation not only in athletics but in nonath- 
letic activities and the school activity pro- 
gram as a whole. One concern of this study 
was to expand the Rehberg thesis to nonin- - 
terscholastic activities. 

One need examine the regression equa- 
tions in Table 3 included in the study to find 
the postulated relationship set forth by | 
Rehberg: Not only are academic achieve- 
ment and participation related, there is also 
a strong relationship between both academic 
achievement and participation and the 
scores the students obtained on four separate 
measures of self-concept when all the vari- 
ables were included in the regression equa- 
tion. By examining the correlational matrix, 
one can find relationships significant at the 
-05 level between athletic participation and 
two additional measures of self-concept. 

In addition to the relationship between 
self-concept and athletic participation, one 
cannot deny the same type of relationship 
between self-concept and the activity pro- 
gram as a whole and the nonathletic activity 
program in particular. In effect, the basic 
hypotheses of this study examined Rehberg's 
postulate in areas beyond the narrow scope 
of interscholastic sports participation. The 
three regression equations indicate research 
data that supports the application of the 
Rehberg thesis to areas other than inter- 
scholastic sports. 

This study was one of the first in the field 
of school activities to combine psychological 
variables with previously researched per- 
sonal variables to attempt to answer ques- 
tions raised by Rehberg in 1969. It pro- 
duced psychological profiles of students who 
participate in school activities; it also dis- 
pelled the myth that school activities appeal 
equally to every student and that school ac- 
tivities are used by large numbers of students | 
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to complete their high school life experi- 
"ence. 

The information provided by the study 
can be used by researchers to explore the 
area of student involvement in school life 
and by administrators to assess who partic- 
ipates in their programs as they try to mea- 
sure the success or failure of tax dollars spent 
yearly to develop extra- and cocurricular 
programs. 
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Children’s Achievement Attributions and Self-reinforcement: 
Effects of Self-concept and Competitive Reward Structure 
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This experiment studied how self-concept as a dispositional trait influences 
children’s achievement attributions and reinforcing behaviors in the social 
context of a competing or a noncompeting other. There were 112 fifth-grade 
boys and girls classified as high or low in self-concept who worked in pairs at 
an achievement-related task in which one succeeded and one failed. Results 
showed that high self-concept children attributed success outcomes more to 
their high ability and engaged in more positive self-reinforcement following 
success than did low self-concept children. The affective significance of 
achievement outcomes was accentuated in competitive settings for high but 
not low self-concept children. The results were discussed within an attribu- 


tion model of behavior. 


Children’s feelings about themselves and 
beliefs about their abilities may be expected 
to influence both their behavior and inter- 
pretations of achievement-related experi- 
ences (Feather, 1971; Nicholls, 1976; Weiner, 
1972). In a classroom setting, these inter- 
pretations involve explanations about the 
causes of achievement outcomes that may 
have important implications for one's sub- 
sequent achievement-oriented behaviors 
(Weiner et al., 1971), self-evaluations (Ames, 
Ames, & Felker, 1977), and interpersonal 
relationships (Feather & Simon, 1971). 
Previous research has shown that certain 
causal explanations, more than others, are 
likely to be inferred for achievement out- 
comes as a function of certain dispositional 
and informational variables (Ames, Ames, & 
Felker, 1976; Frieze & Weiner, 1971; Weiner 
et al, 1971). The purpose of the present 
study was to determine how self-concept and 
Sex as dispositional traits and performance 
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outcome and reward structure as informa- 
tional cues interact to influence a child’s 
achievement attributions and consequent 
self and interpersonal evaluations. Specif- 
ically, the present study examined how 
children high and low in self-concept react 
to success and failure achievement outcomes 
in the social context of a competing or a 
noncompeting other. 

Self-concept can be conceived as a set of 
beliefs about the self that are presumed to be 
a dominant feature in social perception and 
resulting attributional and self-evaluational 
processes (Ames, 1975; Epstein, 1973). 
Therefore, children who differ in self-con- 
cept may be expected to differ in their at- 
tributions for achievement outcomes 
(Feather, 1969; Fitch, 1970; Nicholls, 1976). 
A self-consistency hypothesis derived from 
Heider's balance theory predicts that per- 
sons interpret events in a way that is con- 
sistent with their own self-evaluation 
(Feather, 1971; Jones, 1973). Thus, high 
self-concept persons should be motivated to 
maintain their positive self-evaluation by 
attributing positive or successful experiences 
to their own personal characteristics. 
Conversely, low self-concept persons should 
maintain their negative self-evaluation by 
attributing negative or failure experiences 
to their own personal inadequacies. Ability 
and effort are both personal causal factors 
that are likely to affect achievement out- 


$00.75. 


345 


346 


comes; ability, however, has been described 
as more stable across tasks and situations 
than is effort (see Weiner et al, 1971). In 
the present study, it was assumed that self- 
concept reflects a relatively stable self- 
evaluation based on consistent patterns of 
social and achievement-related experiences 
over time and, as such, should affect attri- 
butions to the more stable factor of ability 
than effort. 

A child's self-concept should also influ- 
ence his or her self-rewarding behavior 
(Bandura, 1971). Studies on children's 
self-reinforcement patterns have distin- 
guished contingent and noncontingent types 
of self-reward. Contingent self-reward has 
been designated as a measure of deserving- 
ness and has been found to be related to the 
informational cues in the situation such as 
performance outcome (Masters, 1972). 
"That is, children feel they deserve more re- 
ward for success than failure outcomes. 
Noncontingent self-reinforcement, however, 
has been found to be more related to the af. 
fective of the event and has been 
hypothesized to indicate a form of “self. 
congratulations" fe success and 
"self-therapy" following failure (Bandura & 
Whalen, 1966; Masters, 1972; Mischel, 
Coates, & Raskoff, 1968), 

Weiner et al. (1971) have shown that 
causal attri determine affective re- 


tions of nonconti; t self- 
high more than low self-concept ao ement, 
expected 


persons may be able to engage in more pos- 


itively oriented , 
contrast, if the low self-concept person in- 
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ternalizes failure, the arousal of negative 
affect should lead to more self-punitive or 
critical behavior. 

Classroom performance situations, how- 
ever, involve group settings that differ from 
nonsocial settings in that they permit social 
comparison and involve some type of reward 
structure (e.g., competitive) that may elicit 
certain social or achievement-based motives 
(Ames et al., 1977; Johnson & Johnson, 1974; 
Scott & Cherrington, 1974). Furthermore, 
the social context of these settings elicits 
attributions for the performance of others in 
the group as well as for one's own perfor- 
mance. 

A recent study by Ames et al. (1977) 
compared the effects of competitive and 
noncompetitive reward contingencies on 
children's attributional behavior. Com- 
petitive reward structures produced some 
ego-enhancing strategies following success 
outcomes and resulted in strong self-derog- 
atory behavior following failure outcomes, 
Their results Suggest that competitively 
oriented achievement settings increase the 
affective significance of success and failure 
experiences. As such, the affectively 
arousing conditions of competition ought to 
differentially bias the attributions of high 
and low self-concept persons. While suc- 
cessful experiences should be more ego 
enhancing in competitive than noncompet- 
itive situations, these effects should be 
maximized for high self-concept persons. 


owing That is, in competitive settings, high self- 


concept persons should perceive themselves 
as more capable and engage in more self- 
congratulatory behavior than low self-con- 
cept persons. Likewise, the effects of failure 
should be more devastating in a competitive 
setting, leading to increased self-therapy by 
— Nee gd child but to increased 
“pun ent - 
chile y the low self-concept 
A limiting factor in the Ames et al. (1977) 
was that the en were exclusively 
re is an abundance of evidence 
to suggest that success and failure outcomes 


^ experienced differently by males and 


d reward consequences 
for another person (Crockenburg, Bryant, & 
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Wilce, 1976; Feather & Simon, 1971), In the 
presence of a coacting other, females may 
tend to be more self-effacing than males 
when attributing their own and the other's 
performance. While males may tend to 
self-reinforce more in competitive settings 
in which they have outperformed another, 
females may self-reinforce more in non- 
competitive conditions, where their own 
achievements do not imply negative conse- 
quences for another. In addition, failure in 
competitive conditions may be more ego 
threatening to males than females; as a re- 
sult, males may engage in more self-therapy 
behavior. In sum, the present study was 
designed to test the effects of performance 
outcome and reward structure cues on high 
and low self-concept children's attributional 
and reinforcing behaviors. 


Method 


Subjects 


‘The initial pool of subjects included 192 fifth-grade 
children (101 boys and 91 girls) from three Indiana 
county schools who were administered an abbreviated 
version of the Piers-Harris Self-Concept Scale (Piers 
& Harris, 1964). The Piers-Harris scale consists of 80 
items assessing children's positive and negative self- 
evaluations across a broad range of behaviors (6g. 1 
am smart," “I do many bad things,” “I have good ideas, 
and “I have many friends"). ‘The scale was developed 
for children in Grades 3 through 10; for children in the 
fifth grade, the scale has yielded internal consistency 
coefficients of approximately ,90 and retest reliability 
coefficients in the .70s over a 4-month period. The 
Children's Manifest Anxiety Scale was also adminis- 
tered to provide a validity check on the self-concept 
classification (see Casteneda, McCandless, & Palermo, 
1956, for a complete description of the scale). Children 
scoring in the top third (n = 28 for both males end f 
males) and the bottom third on the Piers-Harris. 


trolling for level of self-concept and sex. ‘The children 
were tested in like-sex pairs, with each member of the 


come conditions, that is, one 
š within the same 
at the tan; (c) they had scored E er: Skills; 
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striction was employed to reduce the knowledge that 
one child would have of the other's academic abilities. 
"The pairing according to achievement test scores was 
an added precaution intended to equalize any self-other 
performance expectancies within each pair. 


Task 


‘The task involved sets of achievement: related puzzles. 
(see Ames et al., 1977). Each puzzle involved a line 
diagram approximately 1.5 inches (3.81 cm) square. 
"The child's task was to trace over all the lines of the 


were each of them would be able to select a 
prize for their under the guise of " 
us make games for children your age." 


strated that the reduced forms of each scale do mot alter 
measurement of the traita assessed bry the full «cale in 
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Subjects were first asked to solve a practice set of four 

puzzles (two solvable and two insolvable), establishing 
similar expectancy levels for the main task. The in- 
duction of similar expectancies was additionally in- 
tended to heighten competitive strivings in the com- 
petitive conditions and to reduce social comparison 
tendencies in the noncompetitive conditions. At the 
conclusion of about a 3-minute practice period, each 
child announced the number of correctly solved 
puzzles.2 Before beginning the main task, the experi- 
menter restated the reward contingencies, that is, each 
child would get to select a prize after this set of puzzles 
in the noncompetitive conditions, or the winner of the 
next set of puzzles would get to select a prize in the 
competitive conditions. After a prespecified 5-minute 
time limit, both children were asked to announce their 
scores; then, either the winner or both children selected 
a prize, depending upon the structure. The dependent 
measures were then administered. As part of debrief- 
ing, the children were asked to solve another set of 
puzzles. The outcomes were reversed, so that the pre- 
viously losing child won in the competitive conditions 
and received a prize. In the noncompetitive conditions, 
the previous loser won, but no prizes were given. 


Dependent Measures 


Using a questionnaire format, children were asked to 
"estimate the contribution of ability, effort, luck, and 
task difficulty to their performance."? The children 
were asked to circle from one to nine crosses on their 
paper according to how much each factor contributed 
to their performance. This assessment technique was 
adopted from a procedure described by Friend and 
Neale (1972) and later by Ames et al. (1976). Subjects 
were given instructions as follows: 


Here are nine crosses, I want you to tell me how 
skillful you think you were in solving the puzzles by 
circling the crosses. If you think you were very 
skillful, count out and circle seven, eight, or nine 
crosses. If you think you were skillfull, circle four, 
five, or six crosses. If you think you were not skillfull, 
circle one, two, or three crosses. 


Similar instructions were given for the remaining fac- 
tors. To help the child, an abbreviated form of. the scale 
was presented on an index card, Subjects were then 
asked to attribute the performance of the other child, 
using the same procedure (e.g., “Now, think about the 
other girl [boy]. How skillful do you think she [he] was 
in solving the puzzles?"), 
After the attributional assessments, self-other ad- 
ministrations of deserved reward (contingent rein- 
forcement) were measured as follows: 


Self-reward. Here are 10 stars. Circle the number 
of stars you feel you deserve for how you did on the 
puzzles. 

Other-reward. How many stars do you think the 
other boy [girl] deserves for how he [she] did? [Re- 


sponse format was the same as the preceding ques- 
tion.] 
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Measures of noncontingent self-reinforcement were 
obtained by reading 15 randomly ordered statements 
to the subjects, including 5 self-congratulations or 
positive self-reinforcing statements ("I'm a good 
worker," “I feel good,” “I feel smart,” “I really did a good 
job,” and “I’m good at this work”), 5 self-therapy 
statements (“I do things correctly most of the time,” 
“Sometimes I fail but sometimes I succeed,” “It really 
wasn’t important to do well,” “I could do better another 
time,” and “I feel sure I can do better work"), and 5 
self-critical or punitive statements (“I feel bad,” “I 
really did a lousy job,” “I feel dumb,” “I can’t do any- 
thing right,” and “I’m not good enough”). Subjects 
were given the following instructions: 


Sometimes after we do well or not very well on 
schoolwork, we feel like saying things to ourselves, 
Now think about the set of puzzles you just finished, 
What do you feel like saying to yourself? I’m going 
to read some statements, and I want you to circle YES 
if the statement describes how you feel and circle NO 
if the statement does not describe how you feel. 


A score was obtained for each scale (congratulations, 
therapy, and criticism) by adding the number of state- 
ments to which the subject responded “yes.” 

The initial selection and classification of these feed- 
back statements were based on a pretesting with college 
students. From a pool of 25 statements, 40 students 
were asked to classify each statement according to the 
following instructions: “Is this statement something 
you might say to yourself when you want to (1) con- 
gratulate yourself for a job well done, (2) make yourself 
feel better when you have performed poorly, or (3) 
criticize or punish yourself when you have performed 
poorly?” Within each category, five statements were 
selected that attained the highest percentage of agree- 
ment. The mean percentages of agreement were .98 for 
the self-congratulations, .87 for the self-therapy, and 
-89 for the self-criticism categories. 


Results 


Check on Self-concept Classification 


To determine the validity of the self- 
concept classification and the randomization 
method, a 2 X 2 X 2 X 2 (Outcome X Self- 
concept Level X Sex X Reward Structure) 
factorial analysis of variance was computed 
for subjects’ scores on the Piers- Harris and 
the Children's Manifest Anxiety scales. 


? Sufficient time was always given so that each child 
had a chance to try all the puzzles, Generally, the al- 
lotted time was adequate. : | 

3 Although no hypotheses were made concerning ef- | 
fort, task, or luck attributions, Weiner et al. (1971) have 
shown that Primarily these four factors are associa 
with achievement outcomes. Thus, it was decided to 
include measures of the four factors. 
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Significant main effects on self-concept level 
bem obtained on both scales. Providing 
support for the validity of the a priori self- 
concept classification, subjects classified as 
high in self-concept attained significantly 
higher Piers—Harris scores, F(1,96) = 356.85, 
p < .001, but lower anxiety scores, F(1, 96) 
= 122.27, p < .001, than those classified low 
inself-concept. A significant sex difference 
was also found on the anxiety measure; fe- 
males were found to have higher anxiety 
scores than males, F(1, 96) — 9.18, p « .01. 
| There was no other significant main or in- 
teraction effects on either measure, indi- 
cating that the randomization process was 
nonbiased. 


Analysis of Self Ratings 


To examine the data on self-attribution 
and self-evaluation, self ratings on measures 
of attribution and deserved reward were 
analyzed using 2 X 2 X 2 X 2 factorial anal- 
yses of variance with two levels of each fac- 
lor: outcome (success and failure), self- 
concept level (high and low), sex, and reward 
Structure (noncompetitive and competitive). 
The major findings involved ratings of abil- 
ity and deserved reward; Table 1 presents 
the means for these variables. 

An Outcome X Self-concept X Reward 
Structure interaction on ratings of ability (F 
7 5.00, p «.05) revealed that the ability at- 
tributions of the high versus the low self- 
concept groups differed in competitive 
conditions (simple interaction effect, F — 
10.59, p < .01) but not in noncompetitive 
conditions. In the competitive reward 
contingencies, high self-concept children 
rated their ability higher than low self-con- 
cept children following success but lower 
than low self-concept children following 
failure (simple main effects, F = 5.60, p < .05 

and F = 5.00, p < .05, respectively). Addi- 

tionally, there was a marginal tendency for 
high self-concept children to rate their 
ability higher when they succeeded in com- 
petitive than in noncompetitive conditions 
(F = 3.89, p < .06). The ability ratings of 
the low self-concept children, however, did 
not differ between the reward conditions. 

Of further interest are the significant 
two-way interaction findings indicating that 
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"Table 1 

Means for Self-Other Ratings in the Reward 
Structure Conditions for Each Level of 
Self-concept (SC) 


Deserved 
Ability* reward 
Condition Self Other Self Other 
Success outcomes 
High SC 
noncompetitive 5.64 6.07 614 6.43 
High SC competitive 6.71 6.29 6.57 4.93 
Low SC 
noncompetitive 5.86 5.50 593 5.71 
Low SC competitive 5.43 6.29 5.14 4.00 
Failure outcomes 
High SC 
noncompetitive 457 6.71 414 671 
HighSC competitive 4.07 8.29 3.29 821 
Low SC 
noncompetitive 436 7.64 393 771 
Low SC competitive 5.29 8.07 3.50 7.14 


Note. The higher the mean, the higher the ability ratings and 
the greater the reward. 
"The sexes have been combined within each cell (n = 14). 


performance outcome differentially in- 
fluenced high and low self-concept children's 
ability ratings (F = 5.60, p <.05). Children 
high in self-concept perceived themselves as 
more skillful after succeeding (M = 6.17) 
than after failing (M = 4.32; simple main 
effect, F = 23.38, p <.001). Low self-con- 
cept children, however, rated their ability 
similarly after success and failure outcomes 
(Ms = 5.39 and 5.32, respectively; simple 
main effect, ns). High self-concept subjects 
also attributed more ability to themselves for 
success than did low self-concept subjects 
(simple main effect, F = 4.19, p < .05). For 
failure outcomes, the self-concept groups did 
not differ in their ability attributions. 

As would be expected, the outcome of the 
performance had significant main effects on 
ratings of ability (F = 19.99, p < .001), task 
difficulty (F = 6.35, p < .05), luck (F = 55.72, 
p <.001), and deserved reward (F = 44.52, 
p <.001). Table 2 shows that successful 
subjects rated their own ability and luck 
better, their task easier, and felt more de- 
serving of reward after success than did 
failing subjects. Sex differences were found 


4 The degrees of freedom for all analyses are 1, 96, 
except as noted on the noncontingent reinforcement 


measures. 
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Table 2 ee) 
Menu for Self and Other Ratings in Success 


and Failure Outcome Conditions 


Task Deserved 
Condition Ability* difficulty Luck — reward 
Success 
“Self 5.79 4.98 6.27 5.95 
Other 6.04 5.88 4.84 5.27 
Failure 
Self 4.57 5.80 3.68 3.71 
Other R68 4.20 6.95 7.45 
^n = 56 per cell. 


on ratings of the task difficulty (F = 4.34, p 
« 05). Males tended to perceive their task 
as more difficult (M = 5.73) than did females 
(M = 5.05). 

Noncontingent self-reinforcement mea- 
sures. Children's tendencies to engage in 
self-congratulations following success and 
self-criticism and self-therapy following 
failure were analyzed by 2 X 2 X 2 (Self- 
concept X Sex X Reward Structure) factorial 
analyses of variance. Means for these vari- 
ables are presented in Table 3. 

A highly significant main effect on the 
self-congratulations measure showed that 
high self-concept children engaged in sig- 
nificantly more self-congratulatory behavior 
following success (M = 4.50) than did low 
self-concept children (M = 3.21), F(1, 48) = 
16.00, p < .001. 

Children's self-critical behavior following 
failure was jointly affected by their self- 
concept level and the reward contingencies 
in a Self-concept X Reward Structure in- 
teraction, F(1, 48) = 8.86, p <.01. Table 3 
shows that high self-concept children tended 
to be more self-critical following failure in 
competitive than noncompetitive conditions, 
whereas low self-concept children were more 
self-critical in noncompetitive conditions; 
simple main effects were identical, F(1, 48) 
= 4.43, p € .05. Furthermore, low self- 
concept subjects self-administered more 

punitive statements than did the high self- 
concept subjects in noncompetitive condi- 
tions [simple main effect, F(1, 48) = 8.89, p 
< 01], but their self-punitive behavior was 
not statistically different in competitive 
conditions. 

A Sex X Reward Structure interaction was 
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obtained on children's tendency to engage in 
self-therapy following failure, F(1, 48) = 
4.12, p <.05. Males tended to self-admin- 
ister more therapy statements after they 
failed in competitive (M — 4.29) than non- 
competitive situations (M = 3.57); the sim- 
ple main effect was F(1, 48) = 4.88, p < .05. 
Females' self-therapy behavior was not af- 
fected by the reward contingencies (Ms = 
3.93 and 3.71, respectively). 


Analysis of Self-Other Ratings 


Separate 2 X 2 X 2 X 2 X 2 (Outcome X 
Self-concept X Sex X Reward Structure X 
Self-Other) analyses of variance with re- 
peated measures on the last factor were 
performed on ratings of attribution and de- 
served reward. Since the major hypotheses 
associated with these analyses concerned 
differences in self versus other perceptions, 
significant findings that involve the within 
sources of variance are reported. 

Significant higher-order interaction ef- 
fects (Outcome X Self-concept X Reward 
Structure X Self-Other) were obtained on 
attributions of ability (F — 7.18, p « .01) and 
on ratings of deserved reward (F — 5.01, p € 
.05). When failure was experienced, the 
reward contingencies influenced how high 
and low self-concept children rated their own 
and the other's ability and deservingness 
(simple effects, F = 5.23, p < .05 and F = 
5.11, p < .05, respectively). Specifically, 
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Table 3 
Means for Noncontingent Self-reinforcement 
res 
Self-congrat- Self-criti- Self- 
ulations cism therapy 
after after after - 


Condition ' ^ success failure — failure. 


High self- concept 


Noncompetitive — 4.28 EU 3.79 

Competitive 4.71 1.29 4.00 
Low self-concept 

Noncompetitive — 3.57 1.64 3.72 

Competitive 2.85 79 4.00 


* The sexes have been combined within each cell (n = 14). 
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high self-concept children who failed at- 
tributed more ability and administered more 
reward to the other in competitive than 
noncompetitive conditions (simple main 
effects, F = 7.69, p € .01 and F = 4.42, p < 
.05, respectively; see Table 1). In contrast, 
low self-concept subjects who failed per- 
ceived the other as equally more capable and 
more deserving in both reward structures. 
When the outcome was success, high and low 
self-concept children's self-other attribu- 
tions and rewarding behaviors were not dif- 
ferentially affected by the reward con- 
tingencies; simple effects tests were nonsig- 
nificant. 

When the self-concept groups were com- 
bined, significant Outcome X Reward 
Structure X Self-Other interactions oc- 
curred on attributions of luck (F = 12.67, p 
< .001) and on ratings of deserved reward (F 
= 10.51, p < .01). Subjects succeeding in 
competitive and noncompetitive reward 
structures differentially rated their own and 
the other's luck and deservingness of reward 


| (simple interaction effects, F = 8.92, p < .01 


and F = 6.67, p < .05, respectively). In 
competitive conditions, successful subjects 
indicated that they had better luck and de- 
served more reward than the other (simple 
main effects, F = 27.47, p < .001 and F = 
12.68, p < .01, respectively). Subjects suc- 
ceeding in noncompetitive conditions, 
however, did not rate the other’s luck and 
deservingness different from their own. 
Failing subjects also rated their own and the 
other’s luck and deservingness as a function 
of the reward structures (simple interaction 
effects, F = 4.19, p < .05 and F = 4.01, p < 
.05, respectively). Simple main effects tests 
showed that failing subjects perceived the 
other as having better luck in competitive 
than in noncompetitive conditions (F = 7.32, 
p <.01) but were nonsignificant on the rat- 
ings of deservingness. j 
Self-other perceptions were strongly in- 
fluenced by the performance outcome on 
ratings of ability (F = 54.66, p < .001), task 
difficulty (F = 40.83, p < .001), luck (er 
105.85, p < .001), and deserved reward (F= 
127.19, p < .001). Simple main effects 
showed that subjects who succeeded in the 
presence of a failing other rated their own 
task easier (F = 10.42, p < .01), their luck 
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better (F = 19.58, p <.001), and their de- 
servingness of reward greater (F = 6.02, p < 
-05) than the other's. In contrast, subjects 
who failed when their partner succeeded 
rated their own ability lower (F = 129.28, p 
< .001), their luck worse (F = 102.47, p < 
.001), their task more difficult (F = 33.75, p 
< .001), and their deservingness less (F = 
auo p € 001) than the other's (see Table 
Sex differences in self verses other per- 
ceptions occurred on ability attributions (F 
= 10.46, p < .01). Females were found to 
attribute more ability to the coacting other 
(M = 7.25) than did males (M = 6.46) re- 
gardless of the outcome (simple main effect, 
F = 8.44, p < .01). 


Discussion 


The results of the present study demon- 
strated that high and low self-concept chil- 
dren differ in their cognitive and affective 
reactions to success and failure experiences 
in social settings that involve reward con- 
tingencies. As informational cues, perfor- 
mance outcome and reward structure dif- 
ferentially influenced high and low self- 
concept children's self and other attributions 
and reinforcing behaviors. Overall, the re- 
sults suggest that high and low self-concept 
children have discrepant cognitions about 
the role of ability in their achievement per- 
formances, and that while competitive set- 
tings increase the affective value of one’s own 
and another’s outcome, this effect is greater 
for high than low self-concept children. 


Self-concept and Outcome Cues 


One of the main purposes of the present 
study was to investigate the effects of per- 
formance outcome on the self-attributional 
and reinforcing behaviors of children clas- 
sified as high and low in self-concept. The 
findings showed that the attributions of high 
and low self-concept children involve a dif- 
ferential bias toward inferring ability as an 
explanatory construct for achievement 
outcomes. Children high in self-concept 
perceived themselves as more capable fol- 
lowing success than failure; they tended to 
perceive an ability-outcome covariation. 
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There was no evidence, however, that success 
more than failure elicited higher ability at- 
tributions by low self-concept children. 
Finding that the low self-concept children 
did not use outcome cues as informative 
feedback about their own ability suggests 
that this group reacts to success and failure 
much like the “learned helpless” children 
described by Dweck and Repucci (1973). 
That is, they perceived little or no relation- 
ship between the valence of their perfor- 
mance and their own sense of personal cau- 
sation. 

Providing partial support for a self-con- 
sistency hypothesis, high self-concept 
subjects were found to attribute success 
outcomes internally to their high ability 
more than did low self-concept subjects. 
Since internal attributions (e.g., ability) in- 
crease the affective consequences of 
achievement outcomes (see Weiner et al., 
1971), high self-concept persons would also 
be expected to experience greater pride for 
success than would low self-concept subjects. 
Consistent with these predictions, high 
self-concept children engaged in more self- 
congratulatory behavior following success 
than did low self-concept children. Thus, 
high more than low self-concept children 
appear to respond to successful perfor- 
mances with high estimates of their own 
ability and with positive affect, as evidenced 
by their positive self-reinforcing behavior. 


Self-concept and Reward Structure Cues 


Achievement settings that maximize social 
comparison, as in the competitive conditions 
of the present study, would be expected to 
increase ability attributions (Frieze & 
Weiner, 1971) as well as increase the affec- 
tive value of success and failure outcomes 
(Ames et al., 1977; Scott & Cherrington, 
1974). Thus, the experience of success in 
the competitive conditions was expected to 
be more esteem enhancing than success in 
noncompetitive conditions. Further, this 
effect was expected to be greater for high 
than low self-concept children. Competi- 
tion did produce marked discrepancies in 
high versus low self-concept children’s 
ability attributions following success out- 
comes, with the high group exceeding the low 
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group. In addition, high self-concept chil- 
dren tended to attribute more ability to; 
themselves following success in competitive: 
than noncompetitive conditions, whereas 
low self-concept children's ability attribu- 
tions were essentially identical in both re- 
ward structures. Competitive reward con- 
tingencies, however, did not lead to increased. 
self-congratulatory behavior by the high 
self-concept group. Thus, competition does 
appear to strengthen the mediating attri- 
butional link between success outcomes and 
subsequent self-reinforcement for the high 
self-concept child, but whether it also in-i 
creases the feeling of self-congratulations 
remains uncertain. 

Contrary to expectations, failing in com: 
petitive conditions resulted in more negative 
consequences for high than low self-concept 
children. The competitive loss produced 
lowered perceptions of their own abilit 
compared to the low self-concept group, in- 
creased self-criticism, and inflated percep 
tions of the others’ ability and deservingness. 
It is conceivable that these self-derogatory 
behaviors over time could begin to under 
mine their own positive self-appraisal an 
contribute to a learned helplessness belief: 
Further, the differences in self-other per- 
ceptions may have negative implications foi 
future social encounters. A recent study by 
Crockenburg et al. (1976) suggested tha 
"norms of deservingness" may generalize 
other settings; as such, perceiving the othe! 
as more capable and worthy than onese 
may pose a potential threat to one's own 
success in other realms of behavior. 
Whereas similar self-other perceptions may 
promote continued positive interaction, 
highly discrepant self-other perceptions mi 
make it difficult to maintain a positive re 
lationship. In contrast to the high self- 
concept group, low self-concept children 
reacted to failure experiences with as muc 
self-criticism in the competitive conditions 
but with even more self-criticism in the 
noncompetitive settings. It appears that the 
low self-concept child is predisposed toward 
self-punitive behaviors for negative events: 
particularly when there are no externally 
imposed consequences as in the noncom- 
petitive conditions. 

For both self-concept groups, luck seemed: 
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to be a dominant attributional cue for attri- 
buting their own and the other’s perfor- 
mance outcome in the competitive reward 
structures (see also Ames et al., 1977). 
Competing subjects attributed success to 
good luck and failure to a lack of luck re- 
gardless of whether the outcome was their 
own or the other’s. Noncompeting subjects, 
however, did not make differential self-other 
luck attributions. These results and those 
of the previous Ames et al. study suggest that 
competitively experienced outcomes may 
promote an external locus of orientation and 
suppress a more internal orientation neces- 
sary to future achievement-directed behav- 
ior. 


Sex Differences 


Consistent with expectations, females 
were more self-effacing in the presence of a 
coacting partner than were males in that 
they attributed more ability to the coacting 
other than to themselves regardless of the 
performance outcome. Also consistent with 
the hypotheses and with findings reported 
by Crockenburg et al. (1976), the conse- 
quences of failing in competitive situations 
appear to have been more ego threatening 
for males than females. Males engaged in 
more self-therapy behavior following failure 
in competitive than noncompetitive settings, 
whereas females did not differ in frequency 
of self-therapy between the reward struc- 
tures. Finally, some evidence of a defensive 
bias was found in males' ratings of task dif- 
ficulty in that they perceived their task as 
more difficult than females perceived their 
own task. Generally, while the attributional 
and self-reinforcing behavioral differences 
between males and females were not perva- 
sive, they support previous research findings 
that indicate that males tend to be more 
achievement oriented and females more 
deferent in their attributions and achieve- 
ment behavior (Crockenburg et al., 1976; 


Nicholls, 1976). 


Theoretical Implications 


Weiner (1972) has shown that divergent 
attributions determine consequent affective 
experiences and achievement-related be- 
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haviors according to the following model: 
Antecedents (individual-difference factors 
and informational cues) — Attributions for 
Success and Failure — Affective Reactions 
(self-reinforcement). In the present study, 
the behavior of the high self-concept chil- 
dren conformed to this attributional model 
in that performance outcome and reward 
structure cues significantly influenced their 
attributions and resulting affective experi- 
ences. Furthermore, the evidence strongly 
suggests that the cognitive and reinforcing 
mechanisms that high self-concept children 
use for monitoring their own behavior are 
likely to produce heightened self-esteem 
following success experiences. The high- 
ability attribution-positive-affect link in the 
high self-concept child's interpretations of 
success experiences should produce en- 
hanced self-confidence, promote future ap- 
proach-type behavior to achievement tasks, 
as well as increase the value of future 
achievement activities (Weiner, 1974). 
However, the experience of failure in com- 
petitive settings that resulted in depressed 
beliefs in their own ability, self-criticism, and 
markedly discrepant self-other perceptions 
is likely to affect negatively the child's own 
feelings of competence and self-worth and 
potentially interfere with future relation- 
ships with the other. Nevertheless, the be- 
havior of the high self-concept child appears 
to be success oriented, responding to 
achievement events with "origin"-like be- 
havior (de Charms, 1968). These children 
seem to have developed a set of decision rules 
for processing and interpreting informa- 
tional cues and a positive self-reinforcement 
system, both of which ought to contribute to 
further self-concept development. 

The prognosis for the low self-concept 
child, however, is not nearly as favorable. In 
contrast to the high self-concept child, the 
low self-concept child does not appear to use 
a set of decision rules for using performance 
feedback as information about his or her own 
competency in achievement settings. The 
absence of any positive reaction to successful 
outcomes may reflect a resistance by these 
children to modify their negative self-ap- 
praisal (Jones, 1973) or an avoidance of any 
obligation to perform well in the future 
(Maracek & Mettee, 1972; Mettee, 1971). 
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While the low self-concept children have not 
learned to give themselves positive self-re- 
inforcement for their accomplishments, they 
seem to have internalized a self-punishment 
response for poor performances. The self 
theory of the low self-concept child seems to 
have an underdeveloped cognitive compo- 
nent (beliefs about the covariation of skill 
and outcome) but a strong negative affective 
component (self-criticism or punishment) 
that becomes manifest under certain stim- 
ulus conditions. 

"These conclusions have important impli- 
cations for self-concept training programs. 
It is rather clear from the data that merely 
arranging success experiences or removing 
some negative consequences of failure (as in 
the noncompetitive conditions) is not suffi- 
cient, for enhancing self-esteem in the low 
self-concept child. In support of this con- 
tention, Dweck (1975) found that children 
identified as learned helpless evidenced a 
continual decline in performance even after 
a series of success experiences. Since an 
attribution paradigm assumes that persons' 
interpretations of achievement events pre- 
cede and determine subsequent behavior, 
change programs within this framework 
would focus on training children to process 
and interpret behavioral data in new ways 
(eg., see Dweck, 1975). An alternative 
paradigm based on social learning theory 
emphasizes the need for developing a posi- 
tive self-reinforcement system. Both par- 
adigms are cognitively oriented in that they 
involve teaching the low self-concept child 
new concepts for responding to external in- 
formational cues and to his or her own be- 
havior. 

Finally, the present results indicate the 
importance of studying attributional be- 
havior within various social contexts. 
Feather and Simon (1971) noted that social 
situations seem to elicit different forms of 
cognitive bias than are present in nonsocial 
settings; it appears that the operating form 
of cognitive bias in achievement settings is 
dependent on the reward structure of the 
situation as well as on certain predisposing 
cognitive-personality factors. Future con- 

sideration should be given to specifying the 
exact nature of cognitive bias elicited by 
other reward structures and to clarifying the 
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iors. 
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"This study investigated the effects of Teams-Games-Tournament (TGT), an 
instructional strategy employing biracial learning teams and instructional 
games, on cross-racial friendship in integrated classes. Four experiments 
comparing TGT and control treatments are reviewed, involving a total of 558 
students in Grades 7-12. Sociometric measures were used to assess TGT ef- 
fects on the number of cross-racial friendship choices and the percentage of 
cross-racial choices over all choices made. Chi-square analyses showed positive 
TGT effects (p < .05) on 7 of 13 measures of numbers of cross-racial choices 
and 3 of 13 on percentage of cross-racial choices over all choices. "These re- 


sults indicate that TGT is an effec 
ship in integrated classes. 


In 1954, the Brown vs. Board of Educa- 
tion U.S, Supreme Court ruling initiated one 
of the most important social changes of our 
time—the legal desegregation of our nation’s 
schools. However, while the racial compo- 
sition of classrooms has changed, social in- 
tegration of minority groups remains mini- 
mal (Dorr, 1972; Gerard & Miller, 1975). 
Clearly, increased interracial contact is a 
necessary but not sufficient. condition for 
creating more harmonious race relations. 

Reviews of the race relations literature 
(Amir, 1969; Pettigrew, Unseem, Normand, 
& Smith, 1973) have suggested reasons why 
merely creating desegregated classrooms is 
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tive means of increasing cross-racial friend- 


not sufficient for improving race relations 
among students, One important condition 
for constructive race relations cited by the 
reviewers is the creation of interdependen- 
cies among students from various racial 
groups. One way to structure such inter- 
dependencies in the classroom is by creating 
multiracial student work groups (or teams) 
in which all teammates share rewards. 

The idea of using multiracial student 
teams to improve race relations is not new. 
Allport (1954) suggested that if students 
were assigned to multiracial cooperative 
learning teams, they would learn to like and 
help one another. This hypothesis is sup- 
ported by a long tradition of research indi- 
cating that persons placed in a cooperative 
reward structure, in which each group 
member's efforts help the group to be re- 
warded, come to like and help one another 
more than do members of groups that are not 
rewarded based on group performance (for 
Teviews, see Johnson & Johnson, 1974; Sla- 
vin, 1977a). This probably occurs for two 
reasons. First, simply increasing contact 
between people may increase their mutual 
attraction (Lott & Loit, 1965). Because 
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teams tend to enormously increase inter- 


* personal interaction (see, e.g., Slavin, in 


press), sheer increased contact may explain 
the effect of cooperative teams on mutual 
attraction. Second, there is evidence that 
when individuals help one another to be re- 
warded, they become attracted to one an- 
other (Goranson & Berkowitz, 1966). Ina 
cooperative team, each team member’s 
goal-directed behavior helps the group attain 
its goal, thus helping the other group mem- 
bers to be rewarded. Thus, these goal-di- 
rected behaviors (as well as the person pro- 
ducing them) come to be positively evaluated 
by the group. 

If the principle that cooperative teams 
increase mutual attraction is true in general, 
it should be particularly useful for defusing 
racial tensions and increasing cross-racial 
friendships in desegregated classrooms. 

As logical as this appears, only recently 
have researchers examined the impact of 
cooperative learning teams in classrooms on 
cross-racial attraction. Aronson et al. (1975) 
used a system called “Jigsaw Teaching” for 
this purpose in several elementary class- 
rooms composed of black, Anglo, and Chi- 
cano students in Austin, Texas. They found 
greater mutual attraction in their coopera- 
tive Jigsaw teams than in competitive con- 
trol groups. Weigel, Wiser, and Cook (1975) 
used a more general team technique and 
found increased cross-ethnic helping be- 
havior in mixed black, Chicano, and Anglo 
secondary classes in Denver, Colorado. Sla- 
vin (1977b) used a technique called “Stu- 
dent-Teams-Achievement Divisions" to 
increase cross-racial liking and helping in a 
Baltimore junior high school. ‘These studies 
varied considerably in methodology, but all 
contained one essential element: multi- 
ethnic, cooperative teams interacting on 
learning tasks for extended periods (at least 
6 weeks). 

This article reports the results of four 
studies evaluating the effects of a fourth 
team technique on cross-racial attraction. 
This technique is called “Teams-Games- 
Tournament,” or TGT. ome: is eet 
among the techniques used to impro 5 
Hacitd among students of different races 
forseveralreasons. First, it is the most ex- 
tensively researched. Second, it is the team 
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technique for which impact on academic 
achievement as well as effects on cross-racial 
attraction have been most consistently re- 
ported. DeVries and Slavin (in press) re- 
view 10 field experimental studies on TGT, 
7 of which show significant TGT effects on 
academic achievement in such areas as 
mathematics, language arts, and reading. 

The essential features of TGT are student 
teams and academic tournaments. Each 
team has four to five members from all 
achievement levels, both sexes, and different 
ethnic groups. The teammates sit together 
at all times. After the teacher makes an 
initial class presentation, the teams engage 
in peer-tutoring practice sessions. Then, 
usually at the end of the week, a class tour- 
nament is held. The tournament consists of 
a set of skill exercise games. During the 
tournaments, students compete individually 
as representatives of their teams against two 
students of comparable ability from other 
teams. At the end of the tournament, stu- 
dents’ scores are summed to form team 
scores. These scores are reported in a class 
newsletter prepared by the teacher and dis- 
tributed to all students. A more complete 
description of TGT is available in the TGT 
teacher's manual (DeVries, Edwards, & 
Fennessey, Note 1). 

Thus, TGT consists of within-group co- 
operation (teams) and out-group competi- 
tion (tournaments). Some researchers have 
found that while team competition increases 
in-group cohesion, it also increases out-group 
hostility (see Sherif, White, & Harvey, 1955). 
If this were true, team competition would be 
an inappropriate means of increasing cohe- 
sion in integrated classrooms, as there could 
be more new hostilities created. than new 
friendships. However, as Weigel et al. 
(1975) point out, it is unclear to what extent 
team competition effects on out-group hos- 
tility hold true. They distinguish between 
interactive competition, in which teams 
confront each other directly, and compara- 
tive competition, in which teams do their 
own work which is then compared with that 
of other teams. The team competition in 
TGT is clearly of the comparative type, 
which Weigel et al. hold does not create 
out-group hostility. Further, the practical 
advantages of team competition outweigh 
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the possible disadvantages. More than four 
fifths of the class time in TGT is spent in 
purely cooperative activity within the team. 
It is unlikely that the time spent in individ- 
ual or team competition would have a greater 
impact on student behavior than this much 
greater time spent in cooperation. 

This article reviews four studies that 
represent a wide-ranging test of the impact 
of TGT on interracial attraction. The 
studies vary in (a) experimental design, (b) 
demographic characteristics of the student 
populations, (c) length of intervention, and 
(d) measures of cross-racial attraction. 
These studies are limited to evaluation of a 
particular intervention rather than cooper- 
ative team interventions in general, but TGT 
is of special interest because of its earlier 
documented effects on academic perfor- 
mance and other variables (DeVries & Sla- 
vin, in press). 


Method 


‘The four field experiments were conducted in a wide 
variety of school settings. The experiments differed on 
geographical area (east coast and southeast United 
States), grade (seventh through twelfth), subject areas 
(mathematics, social studies, science, and English), and 
percentage of black students (ranging from 10% to 51%). 
They also varied in experimental design, level of random 
assignment, and measures employed. A total of 558 
students served as subjects in the four studies, 

In Experiment 1, four intact seventh-grade mathe- 
matics classes in Baltimore, Maryland were randomly 
assigned to one of two treatments, TGT or traditional 
control, for a 9-week period. Thirty percent of the stu- 
dents were black. The TGT treatment placed students 
on four-member, racially mixed teams. Each team 
competed against other teams on simple instructional 

games that were played in twice-weekly game tourna- 
ments. TGT and control students studied the same 
academic material, but the control classes were char- 
acterized by individual competition between students 
for grades on traditional quizzes. The same white, fe- 
male teacher taught all four classes, The measure of 
race relations (a sociometric questionnaire) was ad- 
ministered before and after treatment and involved 
asking the students to list the names of classmates (a) 
whom they considered their friends in school and (b) 
who had helped them with their classwork. Each stu- 
dent’s response to each item was coded for (a) the 
number of cross-race choices and (b) the number of 
within-race choices, 
Experiment 2 involved stratified random assi, 
of individual students to treatment groups. SRR 
cation was based on achievement level, race, and sex. 
The experimental design included three treatment 
groups: a standard TGT treatment involving cooper- 
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ation within teams and competition across teams ( 
classes), a TGT-cooperative treatment emphasizing the 
within-team cooperation component without n 
competition (one class), and a traditional control ie 
(one class). The control class experienced conventi, 
instruction, including frequent quizzes paralleling the 
TGT games. All students experienced the treatments 
both in their English classes and in their social studies 
classes. One white mathematics teacher and one white 
social studies teacher taught all classes. The students, 
who were seventh graders in a Baltimore junior high 
school, experienced the program for 12 weeks. Fifty- 
one percent of the students were black. 

In this experiment, sociometric items were adminis- 
tered as a posttest only. Students were asked to select 
classmates for each of the following categories: best 
friends, friends outside of school, friends in school, 
would work with/go to for help, and helped you. The 
sociometric items were designed to vary in social dis- 
tance for both the task (helping) and friendship di- 
mensions. As in Experiment 1, the number of cross- 
race and number of within-race choices made by each 
student were calculated for each sociometric item. 
Experiment 3 involved six intact high school social 
studies classes in a mid-sized Florida city. Classes were 
randomly assigned to treatment conditions. The design 
involved a simple two-groups comparison, TGT versus 
a traditional control group. Each of three white 
teachers taught two experimental and one control group 
for 12 weeks. Ten percent of the students were black. 
The control group in this experiment studied individual 
worksheets and took individual quizzes covering the 
same material as that taught in the TGT classes. 

The sociometric measures were administered both 
before and after treatment. The following dimensions 
were assessed: friends outside school, friends in school, 
who would you work with or g0 to for help, and who has 
helped you. Because the classes consisted of only about: 
10% black students, only the cross-race and within-race 
choices received by the black students were analyzed. 
Including all of the white students in the analysis would 
have introduced a large quantity of within-race choices 
that would have obscured possible changes in social 
integration experienced by the black students, 
Experiment 4, previously reported by DeVries and 
Edwards (1974), involved a 2 X 2 factorial design in 
which the factors were task (quiz vs. game) and reward 
(team vs. individual). In the quiz groups, students took 
individual quizzes; in the game groups, students en- 
gaged in the TGT tournaments. The team students 
worked in four- to five-member groups, while the indi- 
vidual students worked alone. Thus, the team game 
treatment was identical to TGT. For this article, the | 
relevant comparison from Experiment 4 is that between 
the team conditions (team quiz and team game) and the 
individual conditions (individual quiz and individual 
game). This study involved students randomly as- 
signed to each of four classes and rotated the four white 
teachers across treatments at the midpoint of the 4- 
Week study to control for teacher effects. "The students 
were Baltimore seventh graders 
43% of them 
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it is important to note that none of the participating 

> teachers were aware of the hypothesis concerning TGT 
effects on race relations. All four experiments were 
considered by both participating teachers and the ex- 
perimenters to be focused on major learning and at- 
titudinal outcomes. In fact, the main question at the 
time of the studies was whether TGT could create 
greater student performance on standard academic 
tasks. Althougha variety of demand characteristics can 
produce confounded results in any social psychological 
experiment (Orne, 1962), the four reported in this article 
appear to be relatively free of such confounding fac- 
tors. 


Results 


Analysis of the data from each of the four 
experiments involved chi-square tests for 
association (Winer, 1962). "Two related but 
conceptually distinct questions were asked 
of each data set. First, were there greater 
increases in the number of cross-race choices 
made by experimental (TGT) students than 
control? This question measures the 
amount of cross-racial attraction in the class, 
which could increase as a result of an in- 
crease in total cross-race and within-race 
choices. Second, were there greater in- 
creases in the percentage of cross-race 
choices out of all choices made by experi- 
mental than control students? This ques- 
tion indicates the degree to which race has 
ceased to be a barrier to attraction, control- 
ling for the total number of choices made. 

The first question was addressed by 
means of 2 X 2 contingency tables in Ex- 
periments 1 and 3 with Factors A (pre-post) 
and B (TGT vs. control). The number of 
cross-race choices were the cell entries. For 
these experiments, only the A X B effects 
were of interest in the analysis. In Experi- 
ment 2, random assignment at the individual 
level enabled the calculation of a 3 X 1 chi- 
square (TGT vs. TGT cooperative vs. con- 

, trol), where expected frequencies were equal 
in each cell. Random assignment in Ex- 
periment 4 permitted interpretation of A, B, 
and A X B (interaction) effects for the two 
experimental factors. ^ 

The second question was addressed in a 
similar fashion, with the addition of a 
within-race versus cross-race factor in each 
analysis. Interest in this case 1$ 1n an AXB 
X C effect in Experiments 1 and 3; an AXB 

- effect in Experiment 2; and A X C,B XC, 
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and A X B X C effects in Experiment 4 
(where Factor C is within- vs. cross-race 
choices). 

The results of the analyses for the four 
experiments are summarized in Table 1. 
Chi-squares for both the number and per- 
centage of cross-race choices are presented 
for each of the six sociometric dimensions. 
A blank cell in the table indicates that the 
specific sociometric variable was not mea- 
sured in the experiment. Numbers of 
cross-race choices and percentages of cross- 
race choices over all choices are presented in 
Table 2. 

For Experiment 1, significantly positive 
TGT effects on both the number and per- 
centage of cross-race choices were found for 
the “helped you” question but not for 
“friends in school”: For number of choices, 
XJap(1) = 5.95, p < .05; for percentage of 
choices, x?Ap(1) = 5.07, p < .05. 

In Experiment 2, different effects were 
obtained for the number and percentages of 
cross-race choices made. The TGT students 
chose a significantly higher percentage of 
opposite-race students to same-race students 
on “best friends,” x2(2) = 7.13, p < .05, and 
marginally more on “friends outside school,” 
x2(2) = 5.31, p < .10, and “who would you 
like to work with,” x?(2) = 5.29, p <.10. On 
the other hand, positive TGT effects on the 
number of cross-race choices made were 
found for “friends in school,” x?(2) = 14.24, 
p <.01, and “helped you,” x2(2) = 11.91, p 
<.01. In fact, even though there were either 
number or percentage effects on all five so- 
ciometric dimensions, in no case were the 
number and percentage effects on the same 
dimension. ' j f 

In Experiment 3, only choices received by 
blacks were analyzed due to the small num- 
ber of blacks in the classes. Significantly 
positive TGT effects were found for number 
of cross-race “friends in school,” x2(1) = 

16.11, p <.01, and “would work with,” x?(1) 
= 7.29, p < .01, and marginal effects were 
found for “helped you,” x2(1) = 2.86, p < 10. 
Marginally positive TGT effects on the 
percentage of cross-race choices made were 
found for “friends in school,” x2(1) = 3.77, 

« .10. 
d Experiment 4, the 2 X 2 (Team X Game) 
design, demonstrates positive team results 
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me Values for Tests of Treatment Effects on Number and Percentage of Cross-Race 1 
Choices - 
Experiment 4 
Sociometri Experiment 3 (teams vs. | 
ise Experiment1 Experiment2 (blacks only) individuals) 
Best friends cii 
Number Aion 
Percentage A 
ae ouside Ales a 
* 
Percentage 5.31 < 
Eee 1.66 14.24*** 16.11*** 8.50*** 
Percentage 1.99 1.51 3.77* 2.44 
Would work with/go to for help AN 
Number «1 s Te 
Percentage 5.29 A 
Sx Cen 5.95** 1181*** 2.86* 15.T8** * 
Percentage 5.07** 2.02 «1 5.70 
Note. df = 1 for Experiments 1, 3, and 4; df = 2 for Experiment 2. 
* p <.10. 
** p< 05. 
*** p « 0l. 


on the number of cross-race choices made by 
students for both “friends in School," x?(1) 
= 8.50, p < .01, and “helped you,” x2(1) = 
15.78, p < .01. Significant team effects on 
the percentage of cross-race choices were 
found for “helped you,” x2(1) = 5.70, p < .05. 
These results differ slightly from those re- 
ported in Experiment 4 by DeVries and 
Edwards (1974). This discrepancy is due to 
their use of a log-linear chi-square model. 
The present analysis uses a simplified model 
for the sake of comparability with the other 
three experiments. 

In summary, significantly positive TGT 
effects on the number of cross-race choices 
made by students were found in 7 of the 13 
instances in which sociometric dimensions 
were measured across the four experiments. 
Two marginally significant effects were also 
found. No effects in favor of the control 
conditions were found. In the case of the 
percentage of cross-race over total choices, 
3 of the 13 comparisons showed significant 
effects, and 3 more were marginally signifi- 
cant. Again, no effects were found in favor 
of the control groups. TGT effects were 

obtained about as often for the more inti- 
mate friendship questions (‘best friends" 
and "friends outside of School") as for the 


less intimate friendship dimensions (“friends 
in school” and “would work with”) or even | 
the entirely task-related sociometric di- 
mension (“helped you"). 


Discussion 


Given these results, it is clear that TGT is 
more effective than control treatments in 
increasing both the number and percentage 
of cross-racial sociometric choices. This 
article reports all four studies in which TGT 
has been conducted in integrated schools. 
Thus, there are no schools in which some 
TGT effect on both the number and per- 
centage of cross-race choices failed to be 
demonstrated. On the other hand, in none 
of the four schools were number and per- 
centage effects found on all variables mea- 
sured. 1 

One possible explanation for the lack of 
positive TGT effects on the percentage of 
cross-race choices in several instances when 
the effects on the number of choices are 
highly significant is a ceiling effect. The 
TGT treatment usually produces a sub- 
stantial increase in friendship and helping 
choices. On several of the posttest mea- 
sures, TGT students chose classmates of the 
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Table 2 
Number and Percentage of Cross-Race Choices 
Experi rs m Experiment 3 
] é rimen: xperiment 2 (blacks only) Experiment 4 
ies TGT Control TGT- Con-  TGT Control USE Tndividual 
ension Pre Post Pre Post TGT C trol Pre Post Pre Post Quiz Game Quiz Game 
Best friends 
em 35 36 30 
Percentage 30 43 
Friends outside "i 
school 
ee 9115091 1777. 
ercentage 42 48 3 
Friends in school à 
Number 59 51 77 94 246 177 28319 69 28 2 % 
1 59 50 41 29 
Percentage 35 26 35 34 471.51 4753 
MESI UM 87 76 81 37 34 31 27 
go to for help 
Number 38120892 97. 75.—21 -d4- 18 
Percentage . 94 48. 392:33:9 79.1 Clee 72 
Helped you 
Number 11 34 33 34 83 47 75 3 34 6 14 21 30 4 14 
Percentage 20:49 "ES lieben: 50 51 4443 81 86 74 34 54 20 29 


Note. TGT indicates the group receiving the Teams-Games-Tournament treatment only; 'TGT-C indicates the group receiving 


the TGT-cooperative treatment. 


opposite race as friends and work mates as 
often or nearly as often as they would have 
if race were not a criterion for friendship or 
helping at all. This no-bias expectation is 
43% in Experiment 1, 52% in Experiment 2, 
92% in Experiment 3, and 49% in Experi- 
ment 4. TGT classes were within 5 per- 
centage points of the no-bias expectation on 
at least one measure in all four studies. In 
such cases, increases in the total number of 
choices would be expected to increase the 
number of cross-race choices but not the 
percentage of choices. In fact, of the six 
instances in which TGT classes reached the 
no-bias criterion, only two showed signifi- 
cant percentage effects (p < .05), although 
five showed significant number effects. 
The relatively strong and consistent TGT 
effects on the number of cross-race choices 
indicate that TGT can increase the amount 
of cross-race friendship and helping, either 
because race is less of a barrier to sociometric 
choice or as part of a general increase in the 
number of friends and work mates claimed 
by all students regardless of race. For 
practical purposes, the latter finding may be 
the more important. If nomination on a 


sociometric measure has any behavioral 
correlates, an increase in cross-racial choices 
indicates an increase in the likelihood that 
black students will have a substantial num- 
ber of white friends and vice versa. If mis- 
understanding and hostility between racial 
groups are a product of limited communi- 
cation or friendship between members of 
different races, then TGT and related team 
techniques may—by increasing the number 
of cross-race friendships—contribute to a 
diminution of racial tensions in schools. 
One major implication of the present re- 
search is that interracial friendship choices 
can be modified by means of a restructuring 
of reward systems and interaction patterns 
in classrooms. In none of the four experi- 
ments were teachers aware that race rela- 
tions were being examined; in every case, 
TGT was presented to teachers as a tech- 
nique aimed at increasing students’ aca- 
demic achievement.: Thus, there was no 
direct personal effort to influence racial at- 
titudes. The effects observed can be at- 
tributed entirely to the placement of black 
and white students on cooperative teams. 
The greatest significance of this research 
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is in its clear message to educators. The 
results obtained by Aronson et al. (1975), by 
Weigel et al. (1975), by Slavin (1977b), as 
well as the present article support the use of 
biracial teams in classrooms to break down 
racial barriers to friendship and to increase 
cross-racial friendship and helping. A large 
body of research on TGT (summarized by 
DeVries & Slavin, in press) has demon- 
strated effects of TGT on academic 
achievement as well as attitudinal variables 
other than racial attitudes. That is, this 
particular team reward system offers to 
teachers the opportunity to improve both 
the academic performance and the cross- 
racial friendship and helping of their stu- 
dents. While continued research is still 
necessary to identify parameters, limita- 
tions, and modifications of team reward 
systems, the results obtained to date are well 
enough established to recommend their use 
in biracial classrooms. 
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Broader Transfer Produced by Guided Discovery of 
Number Concepts with Preschool Children 


Aletha Solter and Richard E. Mayer 
University of California, Santa Barbara 


In two experiments, 43 preschoolers learned the concept of one-to-one corre- 
spondence to identical behavioral criteria by matched discovery, expository, 
or observation methods of instruction. Performance on a subsequent post- 
test, administered by a “blind” experimenter, revealed a pattern in which the 
groups did not differ on short-term recall and near transfer, but the discovery . 
group excelled on far transfer (conservation) and delayed recall. ‘The effect 
of guided discovery on the acquisition of broader learning outcomes was dis- 


cussed. 


Discovery methods of instruction re- 
ceived much attention during the 1960s. 
The promises of “meaningful” learning 
outcomes, superior transfer, and longer re- 
tention were particularly attractive to edu- 
cators (Bruner, 1961; Shulman & Keislar, 
1966). However, consistent empirical sup- 
port for the claims failed to materialize 
(Wittrock, 1966), and there has been a lack 
of agreement on how to define the concept of 
discovery or relate it to a useful theory of 
instruction (Strike, 1975). 

More recently, Mayer (1975) has sug- 
gested a theory of instruction based on the 
idea that meaningful learning depends on 
the satisfaction of at least three conditions: 
(1) reception—the learner must be presented 
with and pay attention to the to-be-learned 
material, (2) availability—a set of related 
experiences must be available in the learner’s 
long-term memory to serve as an assimilative 
set, and (3) activation—the assimilative set 
must be actively processed during learning. 
According to this view, discovery methods of 
instruction might be supposed to have their 
main effect on Condition 3 by encouraging 


subjects to actively search and process their 


existing meaningful knowledge and relate it 
to ongoing learning. 
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There are several situations in which dis- 
covery methods of instruction might fail to 
achieve the goal of broader learning out- 
comes. Even though the subject may ac- 
tively search his existing knowledge and try 
to relate it to what is presented (Condition 
3), if the subject fails to discover the to-be- 
learned principle (Condition 1), no learning 
can occur. This situation provides one in- 
terpretation of the many instances in which 
“guided discovery” results in more learning 
or better transfer than pure discovery (For- 
gus & Schwartz, 1957; Gagné & Brown, 1961; 
Wittrock, 1963). The present study at- 
tempted to overcome this problem by pro- 
viding that all subjects (in discovery and in 
expository groups) learned to the same be- 
havioral criterion of mastery, that is, by 
providing that all subjects achieved Condi- 
tion 1. 

A second situation in which discovery 
techniques might fail to deliver broader 
transfer and retention occurs if the discovery 
subjects do not have a meaningful set of re- 
lated past experiences (Condition 2). Dis- 
covery methods that presumably encourage 
active search of existing knowledge (Condi- 
tion 3) will be of little value if no useful re- 
lated knowledge exists in memory (Condi- 
tion 2). For example, Mayer, Stiehl, and 
Greeno (1975, Experiments 3 and 4) found 
that although all subjects could learn to solve 
simple binomial probability problems by 
discovery, only the group that received pre- 
training in basic underlying concepts per- 
formed better on a subsequent transfer test 
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involving integrative problems. In addition, 
Egan and Greeno (1973) obtained Attitude 
X Treatment interactions in which the test 
performance of rule-method subjects was not 
affected by their scores on pretests for pre- 
requisite concepts, while the test perfor- 
mance of discovery learners was positively 
related to their level of prerequisite concepts. 
These results suggest that the availability of 
prerequisite concepts (Condition 2) is es- 
sential for discovery learning but not for rule 
methods. In order to overcome this problem 
in the present experiment, a task (one-to-one 
correspondence) was chosen for which chil- 
dren in the sample were likely to have had 
some prerequisite concepts (such as previous 
playing with concrete objects and one-to-one 
matching in everyday life). 

Finally, even if Conditions 1 and 2 are met 
(as noted above), discovery techniques will 
not produce different learning outcomes 
than expository methods if expository 
subjects are able to actively search and in- 
tegrate old knowledge with new (Condition 
3). For example, this third situation is 
consistent with Ausubel’s (1968) claim that 
there may be many cases in which expository 
instruction can lead to meaningful learning. 
There is some evidence that instructional 
methods that serve to activate a learning set 
(Condition 3) may be particularly important 
for children, since they may not have devel- 
oped assimilative learning strategies for ex- 
pository instruction. In the present exper- 
iment, preschoolers are used as subjects in 
an attempt to assure that subjects will not 
generally have developed Strategies for ac- 
tively integrating information that is pre- 
sented by expository. methods. 

Several investigators have studied the 
effects of discovery methods for instructing 

children in mathematical concepts (Anast- 
asiov, Sibley, Leonhardt, & Borisch, 1970; 
Olander & Robertson, 1973; Peters, 1970). 
The results, as with those cited above, are 
contradictory. Much of the discrepancy 
may be accounted for by the non-uniform 
manner in which the learning outcomes have 
been evaluated. Also, the concept of dis- 

covery itself is rather vague and has included 
a wide range of very diverse instructional 
strategies. 

Besides discovery training, another man- 
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ner in which the assimilative set might be 
activated during learning is to allow the’ 
learner to manipulate concrete materials, 
Piaget (1965) has claimed that mathematical 
concepts can only be learned if the subject 
has an opportunity to manipulate real 
“concrete” objects. A major problem con- 
fronting the discovery issue is to clearly 
separate “discovery” from “active manipu- | 
lation" in order to ascertain the contribu- 
tions of each to learning. 

In order to investigate the effect of dis- 
covery training and concrete manipulation, 
preschool children were given training in 
one-to-one correspondence. In the first 
study, a discovery method was compared to 
a matched expository method, both of which 
involved active manipulation of objects. 
The second study replicated the first study 
using different materials and included a 
third group (observation training) in which 
subjects did not manipulate objects. The 
learning outcomes were evaluated by tests of 
short-term recall, near transfer, long-term 
recall, and far transfer. This study attempts 
to reconcile some inconsistencies in the dis- 
covery learning literature by carefully mea- 
suring the learning outcome and by using a 
situation in which discovery subjects learn 
to a mastery criterion (Condition 1), are 
likely to possess some prerequisite concepts 
(Condition 2), and are not likely to otherwise 
actively integrate old and new material 
(Condition 3). 

The assimilation theory cited above 
Suggests several predictions that were tested 
in the present study. Discovery subjects 
should connect the new skill (one-to-one 
correspondence) with existing concepts, 
while expository subjects might simply add 
the new behavior without integrating it. 
Since both groups learned to the same cri- 
terion, performance on short-term retention 
should be similar for both groups; however, 
the broader learning outcome of the discov- 
ery group should result in superior perfor- 
mance on problems requiring far transfer of 
the learned material to novel situations and 
on long-term retention. An alternative hy- 
pothesis is that since both groups reach the 
same level of learning, they have learned the 
same thing and should perform similarly on 
all tests. In addition, the present experi- 
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ments will provide information on whether 
training with manipulation (discovery and 
expository training) results in broader 
learning than training involving no manip- 
ulation (observation training). 


Experiment 1 


Method 


Subjects, The subjects were 19 children between the 
ages of 3 years 6 months and 5 years 1 month, who at- 
tended a private nursery school near Santa Barbara, 
California and who failed a pretest for one-to-one cor- 
respondence (out of a larger group of 62 pretested 
children). They came primarily from white, middle- 
class, and upper middle-class homes; written parental 
permission was obtained for all subjects. 

Design. Subjects were divided into two groups, with 
10 subjects in the discovery training group and 9 
subjects in the expository training group. All subjects 
were tested on the same four posttests, so that com- 
parisons by type of posttest are within-subjects com- 
parisons, 

Procedure. Each subject participated individually 
in four sessions, sitting opposite the experimenter at a 
small table in the nursery school. Following each ses- 
sion, a small colored star was given as a reward for par- 
ticipating. 

In Session 1, a pretest for one-to-one correspondence 
was administered to all subjects. Those who passed 
were eliminated from the study. In Session 2 (2 days 
later), those who failed the pretest were given one of two 
training programs for one-to-one correspondence. 
Subjects were grouped in pairs with an attempt to 
equate pairs for age and sex. Members of each pair 
were then randomly assigned to each of the two training 
methods. In Session 3 (5 to 7 days after Session 2), five 
recall tests (short-term recall) and five near-transfer 
tests were administered. In Session 4 (14 days after 
Session 3), three recall tests (long-term recall) and a test 
for conservation of number (far transfer) were ad- 

ministered. The experimenter who administered the 
tests did not know which training the subjects had re- 
ceived. 

Materials. The materials consisted of 12 red and 12 
blue poker chips (3.75 cm in diameter), 12 small red 
poker chips (approximately 1.75-cm diameter), 12 
wooden blocks (2.5-cm sides), and a 25 X 35 cm piece of 
white cardboard separated lengthwise bya 1-em ridge. 
This board was always placed in such a manner that the 
ridge was parallel to the edge of the table. In addition, 
standard data forms and a package of colored gummed 
stars were used. i 

Pretest. For the pretest, 6 blue poker chips were 
placed on the experimenter’s side of the board (the side 
farthest from the subject), next to the ridge, and equally 
spaced, The subject was given the red chips with the 
instructions, “Now you put just as many red poker chips 
on your side. Make it so there are just as many red ones 
as blue ones.” Following the subject’s response, a sec- 
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ond trial with 7 poker chips was given. The criterion 
for passing the pretest was a correct response on either 
or both of the trials. 

Expository training. One blue poker chip was placed 
on the experimenter's side of the board next to the ridge, 
with instructions for the subject to watch carefully. 
One red poker chip was then placed directly opposite 
the blue one, and the experimenter said, “See, I’m 
putting just as many poker chips on your side as there 
are on my side.” The red one was then removed, and 
the subject was asked to repeat what the experimenter 
had done. If the subject did not correctly place a red 
chip opposite the blue one, the experimenter demon- 
strated again. This continued until the subject re- 
sponded correctly. The procedure was then repeated 
with 2, 3, 4, 5, 6, and 7 poker chips. Each error was 
corrected by the experimenter demonstrating again how 
to place the red chips. The training ended after the 
subject had correctly matched 7 blue poker chips with 
7 red ones. 

Discovery training. One blue poker chip was placed 
on the experimenter's side of the board next to the ridge, 
with the instructions, *Now you put just as many red 
poker chips on your side." If the subject did not do it 
correctly, then the experimenter demonstrated, as in 
the expository training. After one chip was correctly 
matched, two chips were presented. 

If the subject did not correctly match 2 chips, the 
experimenter did not demonstrate immediately but 
returned to 1 chip, letting the subject match 1 chip 
again. If the subject still did not match 2 chips correctly 
the second time, then the experimenter demonstrated 
how to doit. This same procedure was then repeated 
with 3, 4, 5, 6, and 7 poker chips. Each time the subject 
made an error, the experimenter returned to the pre- 
vious number, with demonstrations only as needed the 
second time around. The training ended after the 
subject had correctly matched 7 blue poker chips with 
7 red ones. 

Short-term recall tests. Each subject was tested on 
the numbers 5, 6, 7, 8, and 9 with a procedure identical 
to that of the pretest. For the numbers 8 and 9, the 
board was lengthened by adding an extra part to it. 

Near-transfer tests. Five tests involving one-to-one 
correspondence were administered in which the con- 
ditions were slightly different from the training. The 

instructions to the subjects were always to put “just as 
many” objects on their side as there were on the ex- 
perimenter's side of the board. The tests were as fol- 
lows: 

1. Different objects. Tan, 2.5-cm wooden blocks 
were used on both sides of the board instead of poker 
chips. One trial was given using 6 blocks. | 

2. Smaller chips. The subject was given small 
(1.75-cm diameter), red chips, while the experimenter's 
blue ones remained the same large size. One trial was 
given with 7 chips. ^ 

3. Chips close together. The experimenter placed 
the chips in a row next to the ridge with no spaces be- 
tween them. The regular large-sized chips were used. 
One trial was given using 6 chips. 

4. Two rows of chips. ‘The experimenter placed 7 
chips on the board in two rows. One row was next to the 
ridge and contained 4 chips, while the other was farther 
back and contained 3 chips opposite the spaces between 
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Mean Percentage of Correct Responses by the Two Training Groups on the Four Types of Tests 


inks periments ee See ee ae .— - 
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1 week after 3 weeks after 
. training training 
i Short-term Near Long-term Conservation 
cime recall transfer recall (far transfer) 
Discovery group (n = 10) 96 72 10 86 
Expository group (n = 9) 80 64 
t test (df = 17) ns ns p «08 p<.0l 


Note. The Group X Delay of Test interaction was significant at p <.05. Two delay scores were short-term recall and near transfer: 


versus long-term recall and conservation. 


the first four and making a zigzag pattern. One trial 
was given. 1 

5. Piles of chips. The experimenter’s chips were 
placed in 3 piles of 2 chips each next to the ridge, with 
the top chips covering about half of the bottom chips. 
One trial was given, 

Long-term recall tests. The subjects were tested on 
the numbers 5, 7, and 9 with a procedure identical to 
that of the pretest. 

Conservation test (far transfer). The experimenter 
first presented 6 blue chips with instructions identical 
to those of the pretest. Any errors were corrected. 
Once there were 2 rows of chips in correct one-to-one 
correspondence, the experimenter lengthened the row 
of red chips by spreading them out, so that they sur- 
passed the row of blue chips by one at each end. The 
subjects were then asked, “Are there still just as many 
red ones as blue ones, or does one of us have more?” 
‘The subjects’ answers were written down. The same 
procedure was repeated with 7 chips, except that the red 
chips were pushed together instead of spread out, so 


el they were surpassed by one blue chip at each 
end. 


Results 


Thirty-four subjects passed the pretest 
and were eliminated from the study. Ofthe 
28 subjects who failed, 19 completed the 
training and tests. Therefore, there were 9 
dropouts, all of whom quit at some point on 
their own accord (refused to play). Two of 
the children dropped out after the pretest, 
2 during discovery training, 3 during expos- 
itory training, and 2 after completing the 

expository training but before the tests. 

Of the 19 subjects who completed the 
training and tests, 10 were in the discovery 
group and 9 in the expository group. The 
average age of the discovery group subjects 
was 4 years 1 month (48.8 months), while the 
expository group subjects averaged 4 years 
0 months in age (47.9 months). There was 


no significant difference between the meai 
ages of the two groups, £(17) = .32, p > .20 
The discovery group contained 7 girls and 
boys, while the expository group contained 
3 girls and 6 boys. Only three of the dis- 
covery group subjects required demonstra- 
tions during training, with an average oj 
three demonstrations per subject. Th 
other 7 subjects either made no errors o 
corrected themselves. 


Two separate analyses of variance were 
computed on the data. The first had one 
between-subjects factor (the type of training 
received) and one within-subjects factor (the. 
type of test). The effect of the type of 
training was significant, F(1,17) = 4.66, p < 
.05, with the discovery group performing 
better than the 
The effect of the type of test was also sig- 
nificant, F(3, 51) = 28.7, p <.001. However, - 


the main prediction of the assimilation | 
theory is 


above but did not quite reach significance at 
the .05 level, F(3, 51) = 2.98, 5S T: 
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The second analysis of variance also had 
-one between-subjects factor (type of train- 
ing), which is identical to the first analysis, 
but two within-subjects factors: delay of 
test (1-week delay or 3-weeks delay) and 
type of test (recall or nonrecall). For pur- 
poses of this analysis, the conservation and 
near-transfer tests were both considered as 
nonrecall tests. As expected, the effects due 
to delay of test and type of test were both 
significant, F(1, 17) = 134.3, p < .001, and 
F(1, 17) = 842, p < .001, respectively. 
However, there was also an interesting pat- 
"tern of Group X Delay interaction, F(1, 17) 
= 56.99, p < .001, with the groups perform- 
ing at similar levels on the two tests given at 
1 week (short-term recall and near transfer), 
but the discovery group performing better 
on the tests given at 3 weeks (long-term re- 
call and conservation). In addition, t tests 
revealed thatthe two groups did not differ 
in overall performance on the combined 
short-term recall and near-transfer tests, 
t(17) = 1.02, ns, but the difference was 
| significant for performance on the combined 
long-term transfer and conservation tests, 
t(17) = 2.60, p < .02. 

In order to more carefully analyze this 
trend, t tests were conducted to determine 
whether the difference in performance for 
the two groups was significant for each of the 
four tests. The two groups did not differ in 
performance on short-term recall (t < 1) nor 
near transfer (t <1). These tests measured 
behaviors identical or very similar to those 
taught during instruction; apparently, as 
predicted, the groups did not differ in their 
levels of mastery of the presented material. 
In addition, the discovery group outper- 
formed the expository group at only a mar- 
ginally significant level for the long-term 
recall test, t(17) = 1.94, p < .08. The only 
statistically significant difference was ob- 
tained for the test of far transfer in which the 
discovery group performed better on the 
conservation task than the expository 


subjects, t(17) = 3.03, p <.01. 


Experiment 2 


Experiment 2 was conducted in order to 
provide a replicatory test of the interesting 
results of Experiment 1. In particular, Ex- 
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periment 2 provided an additional test of the 
prediction that discovery and expository 
instructions should lead to similar levels of 
performance on tests of mastery of the pre- 
sented material (short-term recall and near 
transfer), but that discovery should lead to 
broader outcomes capable of superior long- 
term retention and far transfer to novel sit- 
uations (long-term recall and conservation). 
Experiment 2 used different materials and 
aslightly different set of tests and included 
a third group (observation group) in which 
the subjects merely watched other people 
move the pieces. In Experiment 2, the 
conservation test was given at both 1 week 
and 3 weeks in order to overcome the prob- 
lem that delay and transfer were confounded 
in Experiment 1. If discovery subjects 
outperform other groups on the conservation 
test, even when it is presented after only 1 
week, that would be support for the idea that 
learning outcomes differed in breadth rather 
than solely in terms of long-term reten- 
tion. 


Method 


Subjects. The subjects were 24 children between the 
ages of 3 years 3 months and 5 years 0 months, who at- 
tended two private schools near Santa Barbara, Cali- 
fornia and who failed a pretest for one-to-one corre- 
spondence (out of a larger group of 53 pretested chil- 
dren). The children from School A (a half-day nursery 
school) came primarily from white, middle-class, and 
upper middle-class homes; while the children from 
School B (a day-care center) came from both white and 
black, middle-class homes, Written parental permis- 
sion was obtained for all subjects. 

Design. There were three between-subjects groups: 
9 subjects served in the discovery group, 8 subjects 
served in the expository group, and 7 subjects served in 
the observation group. Since all subjects took the same 
five posttests, comparisons by type of posttest are 
within-subjects comparisons. 

pine Each subject who failed im pretest 
ici individually in four sessions sitting oppo- 
Pea terat AER table in the school. No 


eliminated from the study as in Experiment 1. In 
Mo 2 (1to14 days later), those who failed the pre- 
test were given one of three training programs for one- 
to-one lence. Subjects were in trios, 
equated as much as possible for age, Sex, and Dem 
attended. Members of sin ipei Gee rand ly 

igned to each of the three train! 

Sesdon 3 (5 to 7 days after Session 2), the recall test 
(short-term recall), near-transfer test, and 
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test (far transfer) were given. In Session 4 (14 days 
after Session 3), the same recall (long-term recall) and 
conservation tests (far transfer) were administered as 
in Session 3. " 

Materials. Materials consisted of 12 small, plastic, 
black-and-white panda bears; 12 small, plastic, partly 
eaten apples; 12 red and 12 blue standard poker chips 
as used in Experiment 1; and the board and data sheets 
used in Experiment 1. 

Pretest. The children were first familiarized with 
the materials, learned to name each object, and were 
shown the bear “eating” one of the apples. Six bears 
were placed on the experimenter’s side of the board next 
to the ridge, equally spaced, and facing the subject, with 
the instructions, “Now you put just as many apples on 
your side. Make it so there are just as many apples as 
bears.” Following the subject’s response, a second trial 
with 7 bears was given. The criterion for passing the 
pretest was a correct response on either or both of the 
two trials. 

Discovery and expository training. The discovery 
training and the expository training were identical to 
Experiment 1, with the exception that bears and apples 
were used instead of poker chips. 

Observation training. Observation training was 
done in the form of a game in which the experimenter 
placed various numbers of bears on one side of the board 
and apples on the other side, and the subject had to 
judge each time whether it was done “right” or “wrong.” 
“Right” was the appropriate answer for a correct one- 
to-one correspondence, that is, the same number of 
apples as bears. Objects were placed in the following 
manner, each pair beginning with the number of bears 
on the experimenter’s side of the board: 1and1,2and 

2, 2 and 3, 3 and 2, 3 and 3, 4 and 5, 4 and 4, 5 and 4, 5 
and 5, 6 and 5, 6 and 6, 7 and 6, and 7 and 7. The apples 
(with the exception of the extra one or the missing one) 
were each placed directly Oppositea bear. Inorder to 
familiarize the subjects with the same vocabulary as was 
used in the other two training methods and the tests, 
during each trial the experimenter said, “I’m going to 
put just as many apples as bears here. Look, are there 
just as many apples as bears? Did I do it right or 
wrong?" If the subject made an incorrect judgment, 
the experimenter simply corrected by saying, “No, I did 


again in the same order as many times as needed until 
six successive correct judgments occurred, 


lows: 


1. Different objects. Blue and red poker chi 
ips were 
used instead of bears and apples. One trial was given 
using 7 poker chips. 
2. Two rows of bears. The experimenter laced 7 
bears on the board in 2 rows of 4 and 3, GO zigzag 
pattern. One trial was give: 
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3. Bears close together. The experimenter placed 
6 bears in a row next to the ridge with no spaces between, 
them. One trial was given. 

4. Piles of bears. Six bears were placed next to the 
ridge in 3 piles of 2 bears each. One trial was given, 

These four tests were equivalent to Transfer Tests 
1, 4, 3, and 5, respectively, of Experiment 1. 

Conservation test (far transfer). ‘The conservation 
test was identical to the conservation test of Experiment 
1, except that bears and apples were used instead of 
poker chips. Two trials were given at both test ses- 
sions. 


Results 


Sixteen subjects passed the pretest and 
were eliminated from the study. Of the 37 
subjects who failed, 24 completed the 
training and tests. Therefore, there were 13 
dropouts. Two of the children were dropped 
after the pretest because of inability to un- 
derstand the instructions, 2 were dropped 
because of inability to learn the observation 
training, and 2 left school befo. sompletion 
of the study. The others dropped out of 
their own accord (refused to play): 3 after 
the pretest, 1 after discovery training, 1 after | 
expository training, and 2 after observation 
training. 

Of the 24 subjects who completed the 
training and tests, 9 were in the discovery 


group, 8 in the expository group, and 7 in the 


observation group. The mean ages of the 
subjects were 4 years 2 months for the dis- 
covery group, 4 years 1 month for the ex- 
pository group, and 3 years 11 months for the 
observation group (49.8 months, 49.0 
months, and 46.7 months, respectively); t 
tests revealed no significant differences be- 


tween these means. The discovery group 
contained 3 girls and 6 boys, while the ex- 
pository group contained 4 girls and 4 boys, 
and the observation group 4 girls and 3 boys. 
Four of the discovery group subjects re- 
quired demonstrations during training, with 
an average of three demonstrations per 
subject. 

The tests were scored as follows: Each 
Subject received five different scores between 
0 and 100, corresponding to the percentage 
of correct answers for each of the five types 
of tests (recall, transfer and conservation 
tests 1 week after training, and recall and 
Conservation tests 3 weeks after training). 


The group means for the five tests are shown 
in Table 2. 


BROADER TRANSFER AND GUIDED DISCOVERY 


- Table 2 


369 


'Mean Percentage of Correct Responses by the Three Training Groups on the Five Types of Tests 


D Experiment? iain ie A eon E 


1 week after 3 weeks after 
training training 
Instructional Short-term Near Conservation Long-term Conservation 
method recall transfer (far transfer) recall (far transfer) 
Discovery group (n = 9) 78 53 
Expository group (n = 8) 63 50 e T A 
Observation group (n = 7) 71 40 0 57 0 
F test (df = 2, 21) ns ns p <.025 p <.08 p <.06 


Note: The Group X Type of Test interaction was significant at p <.05. Two types of tests were short-term recall and near transfer 


versus long-term recall and the two conservation tests. 


Two separate analyses of variance were 
computed on the data. The first had one 
between-subjects factor (the type of train- 
ing) and one within-subjects factor (the type 
of test). The main effect of type of training 
did not réh significance at the .05 level, 
F(2, 21) = 2.88, p < .10, but the effect of type 
of test was significant, F (4, 84) = 29.63, p < 
001. However, the Group X Test interac- 
tion was not significant, although it did ap- 
pear to be in the same direction as in Ex- 
periment 1, F(8, 84) = 1.03, ns. Based on 
the Group X Delay interaction in Experi- 
ment 1, the main prediction of the assimi- 
lation theory for Experiment 2 is a pattern 
of interaction in which all groups perform at 
similar levels for short-term recall and near 
transfer, but the discovery group shows an 
advantage for far transfer and long-term 
recall. A second analysis of variance was 
therefore conducted, which grouped the 
short-term recall and near-transfer scores 
together and the long-term recall and two 
conservation test scores together. There 
was one between-subjects factor (whether 
the training was expository, discovery, or 
observation) and one within-subjects factor 
(type of test). As in the previous analysis, 
the groups did not differ in overall perfor- 


mance, but there was a significant difference 
in the difficulty of the two types of tests, F(1, 


21) = 17.41, p < .001. Furthermore, the 
predicted Group x Test interaction was 
significant, F (2, 21) = 3.60, p < .05, and 
showed the same trend as in Experiment 


1 In order to more closely investigate these 
differences, individual one-factor analyses 


of variance were conducted for differences 
among the three groups on each of the five 
tests. As in Experiment 1, there were no 
significant differences among the groups on 
the short-term recall test given 1 week after 
training (F < 1) nor on the near-transfer test 
given 1 week after training (F < 1. How- 
ever, there were significant differences for 
the conservation test given 1 week after 
training, F(2, 21) = 5.42, p < .025. A New- 
man-Keuls multiple-range test performed 
on these data revealed that the discovery 
group performed significantly better than 
the expository (p < .05) and that the dis- 
covery group performed significantly better 
than the observation group (p < .05), but as 
expected, there were no differences between 
the expository and observation groups on the 
1-week conservation test. This result helps 
extend the results of Experiment 1 in that 
the discovery group performed best on a test 
of far transfer even when it was presented 
closer in time to original learning. On the 
long-term recall test given 3 weeks after 
training, there was only a marginally signif- 
icant effect, F(2, 21) = 2.96, p <.08, and the 
only pairwise difference that reached sta- 
tistical significance (p < .05) based on a 
Newman-Keuls test was the difference be- 
tween the discovery and observation 
groups. 

The conservation test given at 3 weeks 
produced only a marginally significant effect 
for differences among the three groups, FQ, 
21) = 3.31, p < .06, and only the difference 
between the discovery and observation 
groups reached statistical significance at the 
.05 level based on a Newman-Keuls test. It 
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should be noted, however, that unlike Ex- 
periment 1, the conservation test given at 3 
weeks is a repeat of an earlier test. Finally, 
a one-factor analysis of variance was per- 
formed on the combined performance on 
both conservation tests for the three groups. 
The three groups differed significantly in 
performance on the conservation tests, F(2, 
21) = 5.26, p < .025. A Newman-Keuls test 
revealed that the differences between the 
discovery and expository groups and be- 
tween the discovery and observation groups 
were both significant (p < .05), while the 
difference between expository and obser- 
vation groups was not. These results com- 
plement those of Experiment 1: Although 
there were no differences between discovery 
and expository groups for short-term recall 
nor near transfer tests, there were statisti- 
cally significant differences on tests for far 
transfer (conservation). 


Conclusions 


These results provide some evidence 
concerning the effects of discovery and active 
manipulation of objects on children's 
learning of number concepts. Ifthis study 
had used a posttest based only on mastery of 
the presented information (i.e., only a recall 
or near-transfer test), there would have been 
no evidence of differences among the train- 
ing groups in either Experiment 1 or Ex- 
periment 2. However, when posttests are 
given that include far transfer, such as con- 
servation tests and long-term recall tests, 
important differences emerge. In particu- 
lar, the discovery group performed relatively 
better than the expository group on far 

transfer, while both groups performed at 
similar levels for short-term recall and near 
transfer. Another interesting piece of in- 
formation is that in Experiment 2, there were 
no reliable differences in the performances 
of the expository and observation groups. 
Hence, without. discovery, the active ma- 
nipulation of objects (expository group) 
seemed to have little positive effect. 

These findings are consistent with the idea 
that the discovery procedure encouraged 
Subjects to activate their existing cognitive 
Structures concerning number concepts and 
to assimilate the new information to forma 
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broader learning outcome. Subjects in the 
expository and observation treatments ap- 
parently were more likely to add the new 
behaviors to memory without connecting 
them to related ideas. It should be noted 
that the discovery procedure developed for 
these studies was such that subjects contin- 
ued on the problem until they reached a level 
of mastery. 

These results also encourage further work 
in determining the optimal conditions for 
and limits of discovery techniques. The 
present experiment was successful in dem- T. 
onstrating differences in the breadth of 
transfer due to discovery; the situation was 
such that discovery subjects learned to cri- 
terion (Condition 1), discovery subjects were 
likely to possess prerequisite concepts 
(Condition 2), and nondiscovery subjects 
were not likely to normally use assimilative 
learning strategies (Condition 8). In order 
to more fully understand the effects of dis- 
covery instruction on retention and transfer, 
attention must also be paid to these addi- 
tional cognitive variables for any particular 
situation. 

An important pedagogic implication of 
these findings is that equivalent mastery on 
a behavioral level such as was displayed by 
each of the treatment groups does not guar- 
antee equivalency in “what is learned.” 
"There may be situations in which mastery of 
the presented- information (in this case, 
performance on tests of one-to-one corre- 
spondence) is a sufficient instructional ob- 
jective. However, in the case of number 
concepts, the broader learning outcomes 
produced by discovery seem to be desirable 
"cognitive objectives" (Greeno, 1976), 
especially since it is unlikely that all the 
necessary objectives for a lifetime of math- 
ematics learning could be individually. 
taught. These results suggest that there are 
definable situations in which discovery can 
lead to subjects acquiring competencies for | 

] 
y 


objectives that were not ificall 
ie ot specifically 
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problems. 


It has been argued that education should 
be oriented less toward acquisition of par- 
ticular content or of particular content- 
specialized skills than it is now and more 
toward teaching of broadly generalizable 
cognitive skills that can be applied in a va- 
riety of new situations (Hudgins, 1973; Olton 
& Crutchfield, 1969; Rigney, Note 1; Wein- 
stein, Note 2). Such an approach requires 
extensive research, however, on how and to 
what extent broad cognitive strategies or 
skills can be taught. 

In this article, we are concerned with 
teaching of problem-solving strategies, 
There are a number of studies directed 
toward facilitation of problem solving, but 
this number appears small when one con- 
siders what a wide diversity of tasks have 
been called problem solving (Davis, 1966). 
Also, the success of these previous attempts 
has been mixed, With insight problems, for 
example, several studies have given negative 
results (e.g., Anderson & Anderson, 1963; 
Duncan, 1961). Some successful studies 

have involved training that was highly spe- 

cific to the target problem; Saugstad (1957), 

for example, was able to facilitate solution of 
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seem extremely valuable, b a 
need for more analytic understanding of 
which component treatments and program 
features are essential to their success and 
which are not, 


as perceptual reorga 
firm line can be d 


problems an types, insight problems 
With reports of sudden 
tions that occur when 
narrow perceptions of _ 
come. i 


es of training were em- ;. 
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phasized in these studies: problem-refor- 
mulation training and visualization training. 
Both can be related to Johnson’s (1955) di- 
vision of problem solving into three stages: 
preparation, production, and judgment. 
Essentially, the preparation stage is said to 
involve the process of comprehending the 
problem, its givens and goals, and its rules 
for moving from givens to goals. Bourne, 
Ekstrand, and Dominowski (1971) point out 
that “most researchers have studied ‘well 
defined’ problems in which the subject is 
‘fully prepared’ for the problem by the ex- 
perimenter who poses it. Consequently 
little is known about the preparation stage” 
(p. 56). Yet, some have claimed that for- 
mation of an appropriate initial conception 
of the problem is the most difficult and vital 
prerequisite for solving real-world problems 
(e.g., Langer, 1957, pp. 15-17). With prob- 
lem-reformulation training, we tried to in- 
fluence people to concentrate on the prepa- 
ration stage of problem solving by asking 
them to question their initial formulation of 
the problem. 

With visualization training, students were 
instructed to form precise, detailed visual 
images to achieve clarity in their compre- 
hension of verbally presented problems. 
Visualization training was attempted for 
several reasons. First, it appeared that ad- 
equate conceptualization of certain problems 
requires careful visualization of problem 
. components. Some “chain-cutting prob- 

lems" become easier, for example, when it is 
realized that cutting a single link of a chain 
cuts the chain “in three” segments rather 
than “in two," (i.e., the cut link can be sep- 
arated from the segments on each side of it); 
it was predicted that instructions to visualize 
the problem carefully might help subjects 
“see” this relationship. Second, there is 
research evidence that visual imagery is 
beneficial in a number of cognitive tasks, and 
there is anecdotal evidence that it is impor- 
— tant in scientific discovery and insightful 
problem solving (McKeller,1957). —— 

We were interested in the possibility of 
combined as well as separate effects of vi- 
sualization and problem-reformulation in- 
structions. In either case, strategy sugges- 
tions were presented repeatedly in con- 
junction with attempts to solve training 
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problems; thus, instructions were integrated 
with practice on particular problems. 


Experiment 1 


Method 


Subjects. Subjects were 116 students of introductory 
educational psychology at the University of Texas at 
Austin, who participated to fulfill a course require- 
ment. 

Design and procedure. Subjects participated in 
groups ranging from three to nine persons. Each was 
exposed to a training procedure for up to an hour; then, 
after a 5-minute break, was allowed to work on test 
problems for 50 minutes. Subjects were assigned ran- 
domly to one of four treatments to achieve 29 subjects 
per treatment. The four treatments differed as follows 
on training: 

1. Dual training. Subjects attempted to solve each 
of eight insight problems in a training booklet. Each 
problem was defined as an insight problem because it 
could be described in terms of some assumption that 
had to be overcome. One problem used, for example, 
was the traditional “nine dot problem,” which cannot 
be solved if it is assumed that one must stay within the 
boundaries defined by the dots. 

Another example is the following problem: 


If a standard-sized cigarette can be rolled out of 6 
standard-sized cigarette butts, how many ciga- 
rettes can be made and smoked from 36 butts? 


The assumption to be overcome is that this is a prob- 
lem with a one-stage solution. A new cigarette can be 
made from 6 butts in a second stage for a total of 7. 
After subjects had worked on each problem, they were 
given the solution along with instructions that empha- 
sized two kinds of strategies for solving these problems, 
First, the common false assumption for that problem 
was described (e.g., that it has a one-stage solution), and 
subjects were urged to work continually on reformu- 
lating their view of the problem to be sure they were not 
defining it too narrowly or making unnecessary as- 
sumptions about problem requirements. Second, they 


image of the problem components. i 
problems were used to illustrate the need for systematic 
attention to detail in the formation of images. These 
instructions were repeated with modified wording after 
each problem, using the current problem as a concrete 
example for the abstract points made. Subjects were 
givena fixed amount of time to work on each problem 
before the answer and the accompanying instructions 
were given. In this and in the other conditions, di- 
agrams accompanied answers where appropriate, and 
hints were given for two of the problems during the time 
allotted to work on them. . 

2. Visualization training. This group received the 
visualization strategy instructions but not the instruc- 
tions pertaining to problem reformulation. Otherwise, 
all procedures for this group were identical to those Tor 
the dual training group- 
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3. Practice. Subjects were given no advice during 
the training period about effective strategies, but oth- 
erwise they were treated like subjects in the two previ- 
ous conditions. This condition was included to see 
whether simple Practice and feedback without in- 
struction might have an effect, 

4. Control. Subjects in this condition were trained 
with a different set of problems in the attempt to control 
for warm-up effects. Their Practice booklet contained 
arithmetic, logical, and search problems. This booklet 
was constructed in the attempt to obtain some variety 
in the kinds of problems presented but to avoid those 
having a large perceptual-reorganization component, 
As in other conditions, subjects were given a fixed 
amount of time to work on each of. eight problems before 
solutions were presented. Total time for practice 
problems was approximately the same in this condition 
as in the other three. 

Testing and scoring, All groups worked on the same 
test booklet, which contained 11 


third 


number of circled (ie, r 
did not yield a 
treatments, difference 
among the four groups, F(3, 112) = 5.50, p < 
-01, on adjusted test booklet Scores. Post 


among 
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Table 1 : | 
Group Means and Ranges for Adjusted Scores a 


in Experiment 1 


Group M Range Vk. V- 
Dual training 3203 22-53 12.59 8.24 
Visualization 
training 26.21 18-41 10.24 6.66 
Practice 2855 10-43 11.72 7.86 
Control 24.00 6-38 10.7 6.03 
Note, V+ and V- denote means for subsets of test problems 
based on experimenter evaluation of. imagery function: Positive 


(V+) or negative (V-). 


-01) and to visualization training (p < .05). 
Although the practice group appeared in- 
termediate between dual training and con- 
trol in performance, its contrast with each of 
these groups fell short of significance. 
ere was a main effect for test problems, 
treated as a within-subjects factor, F(10, 
1120) = 45.36, p < .001; this effect indicates 
that a range of problem difficulty was 
achieved. There was no interaction between 
test problem and treatment. 
One surprise in these data was the poor 
Showing of the visualization training group. 


nor with any kind of feature which is visibly 
changing during normal use. Describe it 
very briefly.” 


“ (called V— in Table 1) Separate examina- 
>». tion of these two subscores was not helpful; 
it can be seen that relative performance of 
the visualization training group Was fairly 
uniform over problem types. 

Results of the study are more encouraging 
for reformulation training; yet, several ex- 
planations of the superiority of the dual 
training group to the control group can be 
offered. One is that performance of the 
control group reflects, as intended, only 
warm-up effects and that the combination 
f training events employed in the dual 
training condition is necessary to surpass 
“that baseline. It is also possible that strat- 
feies learned by the control group produced 
negative transfer, which contributed to the 
difference. Because many of these problems 
called for systematic search of alternatives, 
they may have introduced a set toward such 
search and away from reevaluation of initial 
perceptions of the problem. Also, a limita- 
tion of this experiment is that it does not 
. indicate whether reformulation instructions 

alone, without concurrent visualization in- 
structions, would have improved perfor- 
mance. To help resolve these ambiguities, 
the following study was performed. 


Experiment 2 


Two control groups were employed to help 
evaluate alternate explanations of the dif- 
ference between dual training and control 
- groups in the previous study. Also, a con- 
dition was included with only the problem- 
reformulation component of dual training 
instructions. 


Method 


Subjects. Eighty-four students of introductory ed- 
ucational psychology at the University of Texas at 
Austin participated to fulfill a course requirement. 

Design and procedure. The training periods of this 
study were followed by breaks of 4 minutes; then, all 
— subjects were given 50 minutes to work on the same test 
problems employed in the previous study. Subjects 
participated in groups of three to eight, persons ata 
time. They were randomly assigned to produce 21 
subjects per treatment group. The four treatment 
groups differed as follows in their initial training pro- 
cedure: 

1. Dual training. Training for this group was 
identical to that for the dual training group of Experi- 


" ment 1. 
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"Table 2 
Group Means and Ranges for Adjusted Scores 


in Experiment 2 


Group M Range 
Reformulation training 34.43 17-55 
Dual training 28.71 19-52 
Practice without feedback 26.19 12-42 
Paragraph memory control 26.38 8-38 


_ Paragraph memory contro! S39 — 


2. Reformulation training. Procedures and in- 
structions for this group were the same as those of the 
dual training group, except that all visualization strat- 

instructions were omitted. Instructions to refor- 
mulate the problem and to question assumptions re- 
mained intact. 

3. Practice-without-feedback control. Subjects 
practiced on three of the training problems used with 
the two previous groups. Timing was the same, but 
these subjects were not given strategy instructions, 
hints, or feedback in the form of provided problem so- 
lutions. ‘They were told that answers would be provided 
at the end of the experimental session. Practice 
problems were limited in this condition in an attempt 
to achieve control for warm-up effects with minimal 
strategy training. 

4. Paragraph memory control. Subjects were given 
2 minutes to try to memorize each of two short readings 
of 150-200 words in length, and they were given 3 min- 
utes to reproduce each passage as best they could. The 
first paragraph was a concrete, factual description of 
first-aid procedures for poison victims, and the second 
was an African folk tale borrowed from Bartlett 
(1932). 

It was hoped that these last two groups would con- 
stitute neutral control groups, which experienced nei- 
ther strategy training nor negative transfer. Tt was felt 
that if there was any transfer at all from training to 
testing, it was likely to be positive for the practice- 
without-feedback group and negative for the paragraph 
memory group. Thus, comparable performance from 
these two groups would increase confidence that neutral 

control conditions had been achieved. 

For all treatments, the test booklet, test procedures, 


and scoring procedures were identical to those of Ex- 
periment 1. 


Results 


Means and ranges for adjusted scores in 
this study are presented in Table 2. Again, 
there was no main effect for number circled 
(F <1),and only .4 of the 11 problems were 
circled on the average. There was a treat- 
ment effect for test booklet scores, F(3, 80) 
= 416, p < 0l. With Newman-Keuls 
comparisons, the reformulation training 
group surpassed all others at the .05 or the 
‘Ol level. None of the other contrasts were 


significant. 
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General Discussion 


Considered together, the two studies 
suggest that problem-reformulation training 
was effective, but visualization training was 
not. The contrast between dual training 
and reformulation training in Experiment 2 
supports the indication in Experiment 1 that 
visualization instructions do not facilitate 
and may even hinder performance with in- 
sight problems. A question for future re- 
search is whether it is the emphasis on visual 
imagery or on careful, fine-grained analysis 
of problem components that is responsible 
for this result. The latter possibility appears 
to be implied in the statement by 
Ehrenzweig (1976) that "precise visualiza- 
tion, or, worse still, a straining of one's at- 
tention to see crystal-clearness where there 
is in fact none, will only produce wrong or 
unusable results" (p. 152). It remains pos- 

sible, of course, that visualization instruc- 
tions of another form, or with other kinds of 
problems, would still prove beneficial. 
Both studies indicate that reformulation 
training can facilitate solution of insight 
problems. They appear on first inspection 
to be inconsistent with prior studies (An- 
derson & Anderson, 1963; Duncan, 1961) 
which have shown that nonspecific transfer 
is difficult to obtain with insight problems. 
Duncan (1961), for example, obtained neg- 
ative results from attempted manipulation 
of verbal responses to problem components, 
from instruction to “use your imagination 
and not be blind to unusual possibilities” (p. 
38), and from provision of practice on a 
training problem. Thus, both practice and 
instruction were found to be ineffective. 
Duncan suggested that neither “active” nor 
“passive” training procedures were effective 
and that solution of insight problems is 
largely a matter of blind trial-and-error 
search. Practice and instruction were not 
combined in a single study, however, and it 
may be that strategy instructions will not be 
effective unless combined with appropriate 
practice or manipulation (Saugstad, 1957; 
Wong, 1975) or that practice will have a 
limited effect unless guided by instructions. 
Perhaps the greater quantity of practice and 
instructions used in the present studies, or 
the fact that our instructions were integrated 
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repeatedly with feedback about practice | 
attempts, accounts for the more successful* 
outcome obtained here. 

Also, in some studies reporting negative | 
results, the training and transfer tasks have M 
involved different types of problems. It has 
been shown, for example, that originality 
training procedures which facilitate diver- 
gent production do not improve performance 
with insight problems (Anderson & Ander 
son, 1963). 

Another possible reason for negative r 
sults in prior studies is that most used pe 
formance on only one or a few insight prob- 
lems as the criterion measure. A test with | 
very few dichotomously scored items, all of 
which may be rather difficult for the subject, 
is probably not very sensitive or reliable.” 
The use of more problems and the attempt 
at a wider range of difficulty in the test’ 
booklets of the present studies may also have 
contributed to the more favorable outcome 
of training. 

It remains to be determined, on the other 4 
hand, whether the differences obtained in ' 
the present studies would be retained over 
time or would generalize to a variety of 
situational contexts (to behavioral or social 
problems, for example). Since much prob- 
lem-solving research indicates decreased 
positive transfer as test problems become 
more dissimilar to training problems | 
(Tuckman, Henkelman, O'Shaugnessy, & 
Cole, 1968), we must be cautious in claiming | 
broad usefulness for the training techniques. 
explored here. Our data do indicate, how- 
ever, that problem-reformulation training 
has some effect, and strong general effects 
might be found with some version of this ' 
training procedure, especially one that pre- | 
sented a great variety of materials over a 
larger span of time. It is also possible that _ 
reformulation training is a component or 
could be made a component of broad-range 
packages like the Productive Thinking: 
Program. 3 

The success of reformulation training | 
provides support for a working assumption 
of the present research, namely, that prob- | 
lem reformulation is often necessary for so- 
lution of the problems used here. "Though 
this research was not intended as a test of the 
Gestalt theory of problem solving, its out- | 
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omes are consistent with that theory be- 
cause the treatments that worked best had 
een suggested by the Gestalt emphasis on 
perceptual reorganization. Several alter- 
native interpretations of the training effect 
remain possible, of course. This training 
may help overcome what Asher (1963) calls 
the “illusion of the unsolvable problem" by 
ffecting student attitudes about their 
ility to deal with these problems, or it may 
uence students to devote more attention 
the initial preparation stage of problem 
ing than do other forms of training. On 
e other hand, reformulation training and 
ualization training may both induce close 
tention to problem comprehension, but an 
companying set toward flexibility may be 
more useful than a set toward detail and 
‘larity with insight problems. It may be 
possible to describe treatment differences in 
terms of ability to overcome interference 
from generalized schemata from long-term 
memory (Rigney, Note 3). The present re- 
ch has demonstrated a potentially useful 
raining effect, but further research will be 
necessary to evaluate theoretical interpre- 
tations of that effect. 


EL 


Reference Notes 


1. Rigney, J. W. On cognitive strategies for facilitating 
acquisition, retention, and retrieval in training and 
education (Tech. Rep. No. 78). Los Angeles, Calif.: 
Behavioral Technology Laboratories, University of 
Southern California, 1976. 

Weinstein, C. E. Cognitive elaboration learning 
strategies. In E. Rothkopf (Chair) Learning 
strategy training. Symposium presented at the 
meeting of the American Educational Research As- 
sociation, New York, April 1977. » 

. Rigney, J.W. Discussion prepared for paper session 
titled Research in learning strategies. American 
Educational Research Association, New York, April 
1977. 


M 


377 


References 


Anderson, R. C., & Anderson, R. M. Transfer of orig- 
inality training. Journal of Educational Psychol- 
ogy, 1963, 54, 300-304. 

Asher, J. Toward a neo-field theory of problem solving. 
Journal of General Psychology, 1963, 68, 3-8. 

Bartlett, F.C. Remembering. Cambridge England: 
Cambridge University Press, 1932. 

Bourne, L. E., Ekstrand, B. R., & Dominowski, R. L. 
The psychology of thinking. Englewood Cliffs, N.J.: 
Prentice-Hall, 1971. 

Davis, G. Current status of research and theory in 
human problem solving. Psychological Bulletin, 
1966, 66, 36-54. 

Duncan, C. P. Attempts to influence performance on 
an insight problem. Psychological Reports, 1961, 
9, 35-42. 

Ehrenzweig, A. Unconscious scanning and dediffer- 
entiation in artistic perception. In A. Rothenberg 
& C. R. Hausman (Eds.), The creativity question. 
Durham, N.C.: Duke University Press, 1976. 

Hudgins, B. B. The improvement of children's 
thinking. St.Louis, Mo: CEMREL, 1973. 

Johnson, D. M. The psychology of thought and judg- 
ment. New York: Harper, 1955. 

Langer, S. Philosophy in a new key. Cambridge, 
Mass. Harvard University Press, 1957. 

McKeller, P. Imagination and thinking. Oxford, 
England: Alden & Mowbray, 1957. 

Olton, R. M., & Crutchfield, R. S. Developing the skills 
of productive thinking. In P. Mussen, J. Langer, & 
M. Covington (Eds.), Trends and issues in devel- 
opmental psychology. New York: Holt, Rinehart 
& Winston, 1969. 

Saugstad, P. An analysis of Maier's pendulum prob- 
lem. Journal of Experimental Psychology, 1957, 
54, 168-179. 

Tuckman, B. W., Henkelman J., O'Shaugnessy, G.P., 
& Cole, M. B. Induction and transfer of search sets. 
Journal of Educational Psychology, 1968, 59, 59- 
68. 

Wong, M.R. Integrative reconciliation in meaningful 
verbal learning. Cognitive Psychology, 1975, 7, 
268-288. 


Received June 29, 1977 m 


Journal of Educational 
1978, Vol. 70, No. 3, 378-387 - 


Delay of Informative Feedback in Computer-Assisted Testing 
Persis T. Sturges 
California State University, Chico 


test (20-min delay), (c) 24-hr later (24-hr delay), or (d) no feedback. Reten- 
tion 1 to 3 weeks later was significantly better for delayed feedback (20 min + 
24 hr) than for immediate feedback; this effect Was seen on items wrong ini- 
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choice format. Informative feedback was 
the re-presentation of each item with indi- 
cation of the correct alternative. 

The major experimental question was, 
Does delay of informative feedback affect 
retention in computer-assisted testing? 
Previous studies have compared 24-hr delay 
of information feedback with one of two 

immediate feedback conditions: (a) infor- 
mative feedback presented immediately 
item by item (2-sec delay) or (b) the series of 
informative feedback presented immediately 
following the entire test (20-min delay). Al- 
‘though in this latter condition, the series of 
informative feedback is presented immedi- 
ately following the test, there is approxi- 
mately a 20-min delay between the student's 
response to an item and presentation of in- 
formative feedback for that item. Thus, 
from the theoretical definition of immediate 
informative feedback, this is not immediate 


i tention (criterion) test were in a multiple- 
. 


pared three delay intervals (2 sec, 20 min, 


3 groups Were given the score on the computer 
test, total number correct and percentage 
correct, 24 hr after taking the computer 


test. 

For both the computer-assisted test and 
. . the criterion test, two additional measures 
=: were used: state anxiety and confidence 
. ratings. One factor that may be related to 
superior retention following delay of infor- 
mative feedback is the amount of anxiety the 
a student is experiencing at the time of infor- 
mative feedback. Several recent studies of 
3 computer-assisted instruction have exam- 
ie ined anxiety in the situation and have Sup- 
: T ported the contention that periodic state 
anxiety measures can be used to investigate 
Ge relationship between anxiety and per- 
ormance (e.g. O'Neil, 1972). These studies 
used the State Anxiety 
State- Trait Anxiety Inventory (Spielberger, 
Gorsuch, & Lushene, 1970) to measure state 
anxiety during the learning of materials 
presented via computer-assis' instruction. 

Higher levels of state anxiety were associat 
with the more difficult learning materials, 
and high state anxiety students were found 
to make more errors jn the more difficult 
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portion of the learning task. Thus, state 
anxiety was measured to determine whether 
there was a difference among the informative 
feedback conditions in the amount of anxiety 
at the time of presentation of informative 
feedback. 

The measure of confidence ratings was 


possible to utilize some of the aspects of de- 
cision theory for this purpose. i 
Bayes’s theorem of conditional probability, 
the ratio of the conditional probabilities of 
two events given some datum (the posterior 
odds ratio) is equal to the likelihood ratio of 
the datum times the ratio of the uncondi- 
tional prior probabilities of those two events 
(the prior odds ratio). Traditionally (Ed- 
wards, 1954), the logarithm of this expres- 
sion is taken to assure the additivity of the 
various components. For the present study, 
it appeared that a subjective probability of 
the “correctness” of an answer cou. d be ob- 
tained immediately prior to any feedback (a 
prior probability of being correct) and also 
during i 
feedback (a posterior probability of being 
correct). Use of Bayes's theorem as outlined 
above, then, should result in a measure of the 


interval (a likelihood ratio). 

The effect of delay of informative feed- 
back upon retention also will be measured 
separately for items initially correct or in- 
. Anderson and his associates have 
suggeste 
hypothesis to explain superior retention with 
delay of informative feedback (Kulhavy & 
Anderson, 
pothesis, 
presented immediately, 
“perseverating” on the response which he or 
she made to the item; and when this response 
is incorrect, it interferes with the correct 
response. With delay of informative feed- 
back, there is no perseveration and, thus, no 
interference. They have support for their 
hypothesis from data indicating that the 
superiority of delay of feedback is more 
marked for items initially incorrect (Surber 
& Anderson, 1975). Although it seems that 
this hypothesis does not directly explain all 
of the findings on delay of informative 
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feedback, it does emphasize the importance 
of investigating the effect of the correctness 
of the initial response. 


Method 


Subjects and Design 


‘The 112 students in the upper division course in child 
psychology, with two sections for each of two instruc- 
tors, were assigned to groups in a randomized block 
design. "The blocking variable was the total score on the 
first two tests in the course. Within each of four blocks 
and each instructor, students were randomly assigned 
to one of four informative feedback groups (no feedback, 
2-sec delay, 20-min delay, and 24-hr delay). All stu- 
dents were tested on a retention test 1 to 3 weeks after 
the computer-assisted test. Of the 112 students who 
completed the experiment, 27% were males and 73% 
were females. 


Apparatus 


For the computer-assisted test, a Digital Equipment 
Corporation Model PDP 11/45 was used, with four 
Teleray Corporation cathode-ray tube terminals and 
a shared hard-copy printer. An interactive test pro- 
gram of the SOCRATES system was used to administer 
the computer-assisted test. 


Materials 


St wasr = .75. Informative feedback 
feedback was the re-presentation 


With a statement, indicating 
"Alternative C is correct"), 


Criterion test. The criterion test consisted of a 


The short-answer test consisted of 10 i 

Th questions re- 
quiring a word or short phrase as an answer. Five of 
these questions were the same as on the computer-as- 
sipeq test, ar the answer being the Correct alternative 
on that test. Five were different ite 
Mine ms over the same 

Confidence ratings. For both the com; isted 

ratings. r iputer-ass; 

test and the criterion multiple-choice test, the student 
marked the degree of confidence he or she had in each 
choice. Instructions for the confidence ratings pre- 


STURGES 


sented the following scale, a 3,inch (7.62 cm) line with 
only the two endpoints labeled: 


1 9 
Guess Certain 


State Anxiety scale. The student's anxiety was 
measured by the short form of the State Anxiety scale 
of the State- Trait. Anxiety Inventory (Spielberger et al., 
1970). The short form of the State Anxiety (A-State) 
scale consists of those five items having the highest 
item-remainder correlations with the normative sample 
of the 20-item State-Trait Anxiety Inventory, State 
Anxiety scale, 


Procedure 


All students attended a lecture-discussion section in 
which Biehler's (1976) Child Development: An In- 
troduction was the text. Except for the computer- 
assisted test and the criterion test, each instructor 
conducted his classes in his regular way. Students in 
all groups took the computer-assisted test covering as- 
signed reading via the interactive testing program be- 
tween the eleventh and thirteenth weeks of the semester 
at a time scheduled by the student, Each student re- 
ported to the computer testing room for two sessions 24 
hr apart. At the first session, students (a) were given 
the short form of the A-State scale with standard in- 
structions, that is, “Indicate how you feel right now”; 
(b) were briefed on the use of the computer; (c) were 
given two sample items; and (d) were given the test by 
the computer, one item at a time, with subjects making 
their responses to each item by typing the letter of their 
choice and the number of their confidence rating. 

A Students in the 2-sec delay group received informa- 
tive feedback immediately after each item. The stu- 
dents in the 20-min delay group were given informative 
feedback for each item in a series at the completion of 
the test. The students in the control group and the 
24-hr delay group received no feedback at the first 
Following the computer-assisted test, all 
students were given the short form of the A-State scale 
with retrospective instructions, that is, “Indicate how 
you felt during the task you have just finished.” Stu- 
dents in the 20-min delay group were then given the 
short form of the A-State scale again with retrospective 


eee at the completion of their series of feed- 
ick. 


s ey were then dismissed. Stu- 
dents in ahr delay group first took the A-State 


instructions, then checked into the 
computer and received informative feedback for each 
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Table 1 


Sequence 0 


Day 1 feedback condition 


Control 2sec 20min Control 


SAS 
Qı Qı Receives 
? T ? ? Score 
A A Ai Ay 
Qe IF; Qe Qo 
e Qo ae D 
An ts An An 
IFy 

SAS SAS SAS SAS 

IF, 

IF2- 

IFN 


Note. 


Tasks for Each Feedback Condition for. the Computer Testing Sessions 


SAS = State Anxiety scale, Q = multiple-choice item, ? = ask for student's answer, A = student writes answer, 
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Day 2 feedback condition 


2 sec 20 min 24hr 


SAS SAS SAS 
Receives Receives IF; 
score score IF 
IFy 
SAS 
Receives 


score 


IF = in- 


formative feedback, subscript = item number (1 to N), and dots (. . .) indicate repetition. 


"Table 1 shows the sequence of tasks for each of the four 
groups for the two computer testing sessions. 

On the computer-assisted test, the students had 
complete control of the time each question was ex] 
both before and after they had recorded their response 
and confidence rating. Also, the student could control 
the time that informative feedback for each item was 
exposed, within a limit of 1 min when feedback was re- 
moved automatically. 

The criterion test was given in regular class periods, 
and it was scheduled at least 1 week and no longer than 
3 weeks after each student had taken the computer- 
assisted test. The short-answer test was given and 
collected before the multiple-choice test was presented. 
For each multiple-choice item, students wrote the letter 
of their choice and the number of their confidence rat- 
ing. After completion of the multiple-choice test, the 
A-State scale was administered with retrospective in- 
structions. The student then completed a question- 
naire on the amount of studying he or she had done 
before and after the computer-assisted test and the 
reactions to taking the test on the computer. 


re. Results 
| Criterion- Test Same Multiple-Choice 
Items 


Performance on the criterion-test multi- 
ple-choice items that were the same items as 
those on the computer-assisted test is of 
primary interest. Analyses were condu 
on the effect of informative feedback con- 
ditions on three measures for the criterion- 


test same multiple-choice items: (a) the 
mean correct, (b) the proportion correct that 
were right or wrong on the computer-assisted 
test, and (c) the amount of change in confi- 
dence ratings for items right and/or wrong 
on both tests. 

First, the effect of feedback conditions 
upon the mean correct on the criterion test 
was analyzed. Figure 1 presents the per- 
centage correct on both the computer and 
criterion tests for each feedback condition. 
Separate unweighted means analyses of 
variance of each test indicated a significant 
effect of blocks on both tests. However, 
none of the interactions between blocks and 
either instructor or feedback conditions was 
significant. Thus, an overall unweighted 
means analysis of variance for these two test 
measures was conducted with two be- 
tween-groups variables, that is, instructor 
and feedback conditions. Computer and 
criterion test scores were repeated measures. 
The overall effect of instructor as well as the 
interactions among instructor, feedback, and 
test were all nonsignificant. There was a 
significant increase in mean correct from the 
computer-assisted test to the criterion test, 
F(A, 104) = 48.56, p < -01. 

Analysis of the simple main effects of 
feedback conditions was conducted at each 
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o——o Criterion test 
w —-4 Computer-managed test 


Control 2sec 20min — 24r 


Feedback Condition 


Figure 1. Percentage correct for the computer-assisted 
test and the criterion test for each feedback condi- 
tion. 


level of the test. There was no significant 
effect of feedback conditions on the com- 
puter-assisted test, F(3, 104) = .40. There 
was a significant effect of feedback condi- 
tions on the criterion test, F(3, 104) = 6.70, 
p X01. "This effect was analyzed by three a 
priori planned orthogonal comparisons, 
‘The effect of feedback conditions combined 
(2 sec + 20 min + 24 hr) was significantly 
greater than that for the group with no 
feedback, F(1, 104) = 15.56, p < .01. Also, 
the mean correct for the two delayed feed- 
back conditions combined (20 min - 24 hr) 
was significantly greater than that for 2-sec 
feedback, F(1, 104) = 4.54, p < .05. There 
was no significant difference between 20-min 
and 24-hr delay of feedback. 

Second, the effect of feedback conditions 
upon the proportion of items correct on the 
criterion test was analyzed separately for 
items correct or incorrect on the computer- 
assisted test. For the Proportion of items 
correct initially that were correct on the 
criterion test, there was no significant dif- 
ference among any of the four feedback 
conditions (control = -89, 2 sec = .90, 20 min 
= .93, and 24 hr = -93). Comparisons among 
the feedback conditions in the proportion of 


PERSIS T. STURGES 


items wrong on the first test and correct on * 
the second test indicated the same effects as 
in the analysis of the mean correct on the 
criterion test: The proportion for 24 hr (.72) 
did not differ from that for 20 min (.67 z= 
1.24; the 20-min group (and, thus, the 24-hr 
group) was significantly higher than the 
2-sec group (.53), 2 = 3.12, p < -01; and the 
2-sec group was significantly higher than the 
control group with no feedback (.39), z = 
3.34, p < .01. 

Third, the effect of informative feedback 
conditions upon the degree of change in ' 
confidence ratings from the computer-as- 
sisted test to the criterion test was analyzed. 
Since the probability of a correct random 
Tesponse was .25 in this four-alternative 
situation, a 1 response (a guess) for either a 
correct or incorrect response was set at p = 3 
.25. The assumption was then made that 
the remaining responses up to a value of 9 
were linear with p, correct responses were 
assessed at equal intervals from -25 to .99, 
and incorrect Tesponses were assessed at 
equal intervals from .25 to .01. For both the 
initial and criterion tests, the logio of the 
ratio p/q was taken as the response measure. 

or example, a 1 response was associated 
with p = .25 and q = .75, the odds ratio was 
-25/.75 = .3333, and the log odds ratio was 
logio .3333 = —47712. As noted in the in- 
troduction, this value for the initial test is a 
log prior odds ratio and for the criterion test 
is a log posterior odds ratio. In accordance 
with Bayes’s theorem, the difference be- 
tween these two values is the log likelihood 
ratio (LLR) and should assess the impact or 
potency of the feedback condition per se. 
Consequently, LLRs were established for 
each criterion initial response set to each 
question. These LLRs were then analyzed 


. classification (wrong-wrong, 
Wrong-right, right-wrong, and right-right). 
It is important to note that since a difference 
Score was generated, it could be assumed 

;... Jetween-subjects differences had been 
eliminated and that only treatments and 
within-subjects error affected the results. 
In the event that this assumption is not en- 
tirely valid, the results cited here are con- 


[s 
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Table 2 


Change in Confidence Ratings from Computer- 
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Assisted Test to Criterion Test 


(Log Likelihood Ratio) 


—1.5762 


Response Feedback condition 
classification Control 2 sec 20 min 24 hr 
Wrong-wrong —971 —.0298 —.1438 —.0616 
Right-right .3162 .3384 .3020 4542 
Wrong-right 1.8153 1.6250 2.0500 2.4181 


Right-wrong 


servative and should not lead to excessive 
Type I errors. 

The log likelihood ratio for each feedback 
condition and each response classification is 
presented in Table 2. Although there was 
a significant effect of response classification, 
F(3, 3344) = 1,342.23, p < .01, this effect is 
meaningless, since it reflects the differences 
between the types of responses per se. For 
example, wrong-right responses are positive 
by definition, and right-wrong responses are 
negative by definition. There was a signif- 
icant effect of feedback conditions, F(3, 
3344) = 5.58, p < .01, and a significant in- 
teraction between feedback and response 
classification, F(9, 3344) 8.15, p < 01. An 
analysis of simple main effects of feedback 
conditions was conducted for each response 
classification. 

For responses wrong on both tests and also 
for responses right on both tests, there was 
no significant difference among the feedback 
conditions. In addition, an analysis of the 
overall level of change in confidence ratings 
was made for both the wrong-wrong answers 
and the right-right answers. The change 
was significant for both wrong-wrong re- 
sponses, F(3, 3344) = 3.50, p < .05, and for 
right-right responses, F(3, 3344) = 15.96, p 
«01. Thus, there was a significant reduc- 
tion in the confidence that an item initially 
wrong and still wrong was, in fact, correct; 
and there was a significant increase in con- 
fidence ratings for items initially right and 
still right. "These changes occurred whether 
or not feedback of the correct answer Was 
provided. 

For wrong-right answers, there was a sig- 
nificant effect of feedback conditions, F(3, 
3344) = 16.76, p < 01. A Newman-Keuls 
analysis was conducted on the differences 
among the means. At the .01 level of confi- 


dence, there was a significantly greater in- 
crease in confidence ratings for the 24-hr 
group than for each of the other three 
groups. Also, the 20-min group was signif- 
icantly greater than the 2-sec group. 
wrong-right response is that of changing an 
initially wrong response to a correct one, and 
an increased change in confidence rating is 
desirable. In this respect, 24-hr delay of 
feedback is significantly superior to all other 
conditions. Even a 20-min delay of feed- 
back is superior to immediate (2 sec) feed- 
back. 

For the right-wrong responses, there was 
also a significant effect of feedback condi- 
tions, F(3, 3344) = 11.74, p < 01. Atthe.01 
level of confidence, a Newman-Keuls anal- 
ysis indicated that the 20-min group had a 
significantly greater change in confidence 
ratings than each of the other three groups. 
A right-wrong answer is the condition in 
which an initially correct response is changed 
to an incorrect response. Therefore, an in- 
creased change in the confidence rating (that 
an item initially right but now wrong is, in 
fact, correct) is not desirable. This analysis 
indicates that the 20-min delay of feedback 
is significantly worse in this respect than any 
other feedback condition. Of particular 
interest is that no differences on this mea- 
sure were found between 24-hr delay of 
feedback and the control condition with no 
feedback. 

An additional analysis was conducted on 
the effect of feedback conditions upon the 
mean correct on the criterion test as a func- 
tion of the initial difficulty of the items. The 
items were divided into 3 sets of 10 items 
each on the basis of the percentage of stu- 
dents who had each item correct on the ini- 
tial computer-assisted test. The percentage 
correct on the criterion test for each feed- 
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back condition for each level of difficulty is 
shown in Figure 2. An unweighted means 
analysis of variance on these data was con- 
ducted with four feedback conditions 
crossed with three levels of item difficulty. 
Both main effects were significant: feed- 
back, F(3, 108) = 7.69, p < .01, and item 
difficulty, F(2, 216) = 58.98, p < .01. An 
analysis of the simple main effects of feed- 
back conditions at each level of item diffi- 
culty indicated that there was a significant 
effect of feedback conditions only for the 
most difficult items, F(3, 108) = 11.13, p< 
01. 

Additional a priori planned orthogonal 
comparisons indicated the same relation- 
ships among the feedback conditions for the 
most difficult items as found in the earlier 
analysis of the overall mean correct on the 
criterion test. Conditions receiving feed- 
back were superior to that with no feedback, 
F(1, 108) = 19.23, p < .01; the two delay 
groups (20 min + 24 hr) were superior to the 
2-sec group, F(1, 108) = 7.01, p < .01; and 
there was no significant difference between 
20-min and 24-hr groups. 


Additional Measures 


Four additional measures were analyzed 
by unweighted means analyses of variance 
with two between-groups variables, that is, 
blocks and feedback conditions. The first 
was the mean correct of the 17 multiple- 
choice items on the criterion test that were 
not the same as on the computer-assisted 
test. Second, for the criterion-test short- 
answer items, each answer was scored on a 
scale of from 0 to 4 points, making the total 
possible score for the five same items and the 
five different items each 20, and these means 
were analyzed. "Third, the time each item 
was exposed on the computer-assisted test 


of feedback conditions was not significant. 
here was a significant effect of blocks for 
both the different multiple-choice items, 
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Figure 2. Percentage correct on criterion test for three 
levels of item difficulty for each feedback condition. 


F(3, 80) = 2.96, p < .05, and for the short- 
answer test, F(3, 80) = 8.69, p < .01. Also, 
for the short-answer items, the scores for the 
same and different items were a repeated 
measure. The mean for the same items 
(12.65) was significantly higher than that for 
the different items (10.77), F(1, 80) = 12.21, 
p «€ .01. 


State Anxiety Scale 


Several unweighted means analyses of © 


variance were conducted on the different 
-State measures. On the first measure on 
Day 1, before any experimental treatments, 
there was a significant difference among the 
feedback groups, F(3, 80) = 4.65, p<.0l. 
The 2-sec group had significantly higher 
A-State scores than the combined 20-min 
and 24-hr groups, F(1, 80) = 12.62, p < .01. 
With this result, an analysis was conducted 
on the amount of change from the initial 
measure to the A-State measure directly 
after presentation of informative feedback. 
ere was no significant effect on this mea- 
sure. Also, there was no Significant effect on 


the A-State measure administered after the 
criterion test. 


b 


g 
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The means for the initial A-State measure 
for Day 2 were compared between the two 
groups that had received informative feed- 
back on Day 1 (2 sec and 20-min) and those 
that had not (control and 24 hr). This 
analysis indicated that the mean A-State was 
significantly higher for those who had not 
received informative feedback previously 
(8.30) than for those who had (7.24), F(1, 79) 
= 441, p <.05. 


Attitude Toward the Computer-Assisted 
Test 


Each student’s response to the question 
about his or her reaction to taking the test on 
the computer was put into one of three 
categories: positive, negative, and mixed or 
neutral. Of the total, 41% of the responses 
were positive, 23% were negative, and 36% 
were mixed or neutral. A chi-square analysis 
indicated that there was no significant dif- 
ference among the feedback conditions in 
the relative frequencies of responses in the 
different categories. 


Discussion 


One conclusion from the present experi- 
ment is that a consistent finding of previous 
laboratory experiments has been confirmed 
with computer-assisted tests in an educa- 
tional setting. Long-term retention of aca- 
demic material following some delay of in- 
formative feedback is superior to that with 
immediate informative feedback. It is of 
particular interest that these results oc- 
curred in an educational setting with no 
control over studying the material. Also, the 
results occurred in classes in which instruc- 
tional practices differed in some important 
ways, that is, whether or not the score on the 
computer and the criterion tests contributed 
to the grade in the class and whether or not 
informative feedback was given for each of 
the regular examinations in class. This ef- 
fect was accounted for by the proportion of 
items initially wrong that were right on the 
retention test or the more difficult items on 
the initial test. This finding is consistent 
with the perseveration-interference inter- 
pretation of the delay retention effect 
(Kulhavy & Anderson, 1972). However, the 
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fact that there was no difference in retention 
performance between the 20-min and 24-hr 
feedback conditions for items wrong initially 
and correct on the later test does not support 
the perseveration-interference hypothesis. 

The interpretation that differences at re- 
tention were due to the conditions of pre- 
sentation of informative feedback per se and 
not to indirect effects of increased motiva- 
tion for studying (and so on) is supported by 
the present findings. There was no effect of 
feedback conditions on criterion-test mul- 
tiple-choice items that were not previously 
presented and thus did not have informative 
feedback on the initial test. Also, the con- 
trol group receiving no informative feedback 
did more poorly on the retention test, al- 
though motivation for studying should have 
been high. 

The facilitative effect of a delay of feed- 
back upon later retention was not so marked 
in the present study as it has been in previ- 
ous laboratory experiments. There was no 
difference in the accuracy on the multiple- 
choice criterion test between 24-hr and 20- 
min delay of informative feedback. Also, 
there was no effect of different feedback in- 
tervals or even of the presence of feedback 
upon the short-answer test items. The 
reason that the longer 24-hr delay did not 
result in superior retention performance in 
the present study is not clear. Itis true that 
several conditions of the present experiment 
“pushed the limits” of those of previous 
studies. The initial level prior to feedback 
was relatively high and the retention interval 
for many of the students was longer than has 
previously been used. Also, the reliability 
of both the computer and the criterion tests 
was relatively low. The fact that the effect 
of the feedback interval occurred only for the 
most difficult set of the initial test items does 
suggest that a greater effect might be ex- 
pected with more difficult test items. 
However, it seems likely that the present 
results—or lack of results on this mea- 
sure—are due toa combination of factors. 

As indicated earlier, confidence ratings 
were introduced to provide a more continu- 
ous measure of retention than the more 
traditional dichotomous right-wrong mea- 
sure. With the more continuous measure of 
the student's confidence in his or her answer, 
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retention following the longer 24-hr delay of 
informative feedback was superior to that of 
all other feedback conditions. This finding 
is consistent with the interpretation that 
when students receive information about the 
correct answer after a longer delay interval, 
they gain more information about informa- 
tive feedback (Sturges, 1969, 1972). To say 
that students gain more information and, 
thus, show better retention following delay 
of informative feedback is consistent with 
recent interpretations that retention is a 
function of both the level or depth of pro- 
cessing and the spread of encoding (e.g., 
Craik & Tulving, 1975). Improved retention 
following greater depth of processing and/or 
spread of encoding has been reported for 
intentional as well as incidental learning and 
in a less controlled situation, more similar to 
that in the present study (Craik & Tulving, 
1975). Arecent study (Seamon, Murray, & 
Barclay, Note 2) reports that confidence 
ratings as well as accuracy in an incidental 
learning recognition test are higher after 
subjects have engaged in a semantic orient- 
ing task than a structural orienting task with 
meaningful words. 

It is hypothesized that after a longer delay 
interval, students engage in a more thorough 
semantic analysis of information presented 
at feedback, which accounts for the in- 
creased confidence ratings at retention. The 
findings that there was no effect of feedback 
intervals on time Spent in answering the 
items and in studying informative feedback 
add evidence for the interpretation that re- 
tention following informative feedback isa 
function of how the information is processed 
rather than study time and so on. 

There are several limitations to the in- 
terpretations and implications of these 
findings. Previous studies have reported 
that the relative impact of immediate and 
delayed informative feedback upon later 
retention varies with the relationships 
among the forms of the initial item, the in- 
formative feedback presentation, and the 
retention test item (Sturges, 1969, 1972). 
Thus, one limitation of the present results is 

to a situation in which items in these tasks 
are presented in the same form, that is, with 
the same alternatives in the same order. 
From the interpretation suggested by Stur- 
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ges (1972), it would be predicted that the | 
type of response—in this case, the letter of 
the alternative—is also important especially 
in relation to the forms of the initial test 
item, the informative feedback presentation, 
and the retention test item. Another limi. 
tation of the present results is that there was 
only one session with test and informative 
feedback. As yet, there is no evidence on the 
relative effectiveness of immediate and de- i 
layed informative feedback for repeated 
tests throughout a course. 
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Effects of Anxiety, Curiosity, and Perceived 
Instructor Threat on Student Verbal Behavior 
in the College Classroom 


Ruth A. Peters 
University of South Florida 


The verbal behavior of 152 college undergraduates was investigated in four 
different classes. Trait anxiety, trait curiosity, and perceived instructor 
threat were measured by the State-Trait Anxiety Inventory, the State-Trait 
Curiosity Inventory, and the Tuckman Teacher Feedback Form. Student- 
initiated questions and responses to instructor questions were rated by trained 
observers during eight 1-hour class sessions. In general, males gave more re- 
sponses than females, and students who perceived their instructors as threat- 
ening gave fewer responses than those who rated their instructors as nonthreat- 
ening. High curiosity stimulated student-initiated verbal behavior for both 
sexes, but only when the instructor was perceived as nonthreatening. For 
males, high anxiety inhibited the students' responses to instructor questions 
when the instructor was perceived as threatening, whereas females gave few 
responses to instructor questions regardless of their personality characteris- 


be 


tics. 


Classroom participation is an important 
factor in the acquisition of knowledge. 
Quite often, an adequate grasp of the subject 
matter in a course depends upon discussion 
and clarification of the topics with which the 
course is concerned. Furthermore, it has 
been repeatedly demonstrated that class 
participation is positively related to aca- 
demic achievement (Amidon & Flanders, 
1961; Fahey, 1942; Flanders, 1948; Vidler & 
Rawan, 1975; Williams, 1971). Thus, opti- 
mal classroom learning appears to be facili- 
tated by active student participation. 

The college classroom is often perceived 
as a threatening environment in which many 
students do not actively participate in class 
discussions because of feelings of anxiety or 
insecurity (Weintraub, 1971). In general, 
threat appears to deter classroom partici- 
pation. For example, Kelly (1950) found 
that students who perceived their instructors 


This article is based on part of a dissertation directed 
by Charles D. Spielberger submitted to the Department 
of Psychology, University of South Florida in partial 
fulfillment of the requirements for the PhD degree. 

Requests for reprints should be sent to Ruth A. Pe- 
ters, who is now at the Northside Community Mental 
Health Center, 13301 North 30th Street, Tampa, Flor- 
ida 33612. 


as “cold” initiated significantly less verbal 
interaction in the classroom than did stu- 
dents who viewed their instructors as 
“warm.” According to trait-state anxiety 
theory (Spielberger, 1966, 1972), persons 
who are high in trait anxiety are especially ' 
prone to perceive evaluative situations as 
threats to self-esteem and to respond to such 
situations with elevations in state anxiety. 

Studies of classroom participation have 
generally focused upon the effects of the 
questioning style of the instructor (Amidon 
& Hough, 1967). Only two studies could be 
located in which the impact of student per- 
sonality traits on classroom participation 
was investigated. Williams (1971) found 
that less active class participation was cor- 
related with neuroticism in college students, 
and Evans (1971) reported more question- * 
asking behavior for highly curious college 
students than for those who were low in cu- 
riosity. Thus, it would appear that high 
anxiety may inhibit classroom participation, 
whereas curiosity stimulates question-asking 
behavior. 

The purpose of the present study was to 
investigate the effects of anxiety and curi- 
osity on student verbal behavior in the col- 
lege classroom. A further goal was to eval- 
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uate the effects of perceived instructor threat 
on student verbal behavior. On the basis of 
Kelly’s (1950) findings and trait-state anx- 
iety theory (Spielberger, 1966, 1972), high 
perceived instructor threat and high trait 
anxiety were expected to deter student 
classroom participation. Generalizing from 
Evans's (1971) findings, it was expected that 
students with high trait curiosity would ask 
more questions and participate more actively 
in the classroom than students who were low 
in curiosity. 


Method 


Subjects 


The subjects were 152 undergraduate students en- 
rolled in two psychology and two sociology courses at 
a large state university. The classes varied in size, from 
25 to 42 students, and were taught by four different 
male instructors. Prior to the beginning of this inves- 
tigation, the instructors agreed to allow raters to observe 
the behavior of the students in their classes. 


Experimental Measures 


"The personality measures used in this study were the 
A-Trait scale of the State- Trait Anxiety Inventory 
(STAI; Spielberger, Gorsuch, & Lushene, 1970) and 
the C-Trait scale of the State- Trait Curiosity Inventory 
(STCI; Spielberger & Butler, Note 1; Spielberger et al., 
Note 2; Frain, 1977). The 
anxiety proneness in social-evaluative situations. 
Scores on this 20-item scale range from 20 to 80. The 
validity of the STAI as a measure of trait anxiety has 
been demonstrated in a number of studies of stress, 
anxiety, and learning (Spielberger et al., 1970). 

The STCI C-Trait scale assesses individual differ- 
ences in curiosity and exploratory behavior. Scores on 
this 15-item scale range from 15 to 60. Evidence of the 
concurrent validity of the STCI as a measure of trait 
curiosity is reflected in significant positive correlations 
of this scale with Vidler and Rawan's (1975) Academic 
Curiosity Inventory (Frain, 1977), and with Day's (Note 
i Ontario Test of Intrinsic Motivation (Butler, Note 

The Warmth and Acceptance scale of the Tuckman 
Teacher Feedback Form (‘Tuckman, 1974) was used to 
measure student's perceptions of instructor threat. 
This scale consists of seven semantic differential items 
that require students to evaluate their teachers along 
the following dimensions: patient-impatient, Warm- 
cold, amiable-hostile, accepting-critical, sociable- 
unfriendly, fair-unfair, and gentle-harsh. Scores on 
the Warmth and Acceptance scale range from 0 to 43, 
with lower scores denoting less warmth. Since scores 
on the Warmth and Acceptance scale reflect perceived 
instructor threat, this scale will be referred to as the 
Threat scale. A comprehensive review of investigations 
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evaluating the validity of the Teacher Feedback Form 
as a measure of students’ perceptions of instructor 
personality variables in relation to classroom structure, 
teacher behavior, and student opinions of teacher ef- 
fectiveness is found in Tuckman (1974). 


Procedure 


Students were given an opportunity to select their 
own seats and were requested to remain in these seats 
throughout the duration of the study. This procedure 
facilitated the observers’ accurately recording the verbal 
behaviors of each student. 

Two observers were assigned to each classroom (the 
author and a trained assistant) and were introduced 
during the first class session. At this time, the class was 
informed that the study was concerned with investi- 
gating the relationship between student personality 
traits, seating position in the classroom, and teacher 
behavior. The instructors had been previously in- 
formed that student verbal behavior would be record- 


ed. 

"The STAI A-Trait and the STCI C-Trait scales were 
administered in the first class session after the students 
were given the rationale for the study, Classroom ob- 
servations began in the second week of the term (during 
the third class session) and continued through the fifth 
week. For each class, a total of eight 1-hour sessions 
were observed, The observers classified student verbal 


verbal behaviors were voluntary responses by students 
to the instructor's questions. For each student, the 
frequency of I-type and R-type verbal responses was 
recorded during each class session. 

The measure of perceived instructor threat was ad- 
ministered to the students in the class period immedi- 
ately following the last observation session. The stu- 
dents were directed to respond to this inventory but not 
to write their names on the answer sheet, After the 
students had anonymously completed the Threat scale, 
they were then told that the experimenter was inter- 
ested in their individual perceptions, asked to write 
their names on the inventory, and assured that their 
res] would not be shown to the instructors. After 
completion of this procedure, the students were pre- 
sented with the rationale for this deception. It should 


= 

1 These categories were derived in pilot work based 
on observations of student verbal behavior in two un- 
dergraduate psychology classes. 

2 Frequency tabulations for subcategories of I-type 
verbal behavior were also obtained. However, the 
findings for these subcategories were essentially the 
same as for the combined I-type verbal behavior cate- 

. Therefore, only the analysis for the total I-type 


data will be presented. 


390 


be noted that subjects had the option of refraining from 
signing their names. However, only two subjects ex- 
ercised this option. 

Feedback about the study was given in each class to 
both students and instructors in a special class session 
near the end of the term. The students were informed 
of their scores on the Curiosity and Anxiety scales, 
They were also given information with regard to the 
median and upper and lower quartiles of the norms for 
these scales, so they could determine how they com- 
pared with other students. After the conclusion of the 
term, the instructors were informed of the average 
ratings they received on the Threat scale, 


Results 


The means and standard deviations (in 
parentheses) of the students’ semantic dif- 
ferential ratings of perceived instructor 
threat for each instructor were 35.85 (5.17), 
33.64 (6.18), 33.50 (5.69), and 32.14 (5.72). 
In order to determine whether or not the four 
instructors differed in terms of how they 
were perceived by their classes, the instruc- 
tor threat ratings were evaluated by a one- 
way analysis of variance procedure. The F 
test for the analysis was not significant, F(3, 
116) = 1.75. Since the mean threat ratings 
for the instructors by their respective classes 
were not significantly different, the data for 
all four classes were combined. Each stu- 
dent’s individual perception of his or her 
instructor as more or less threatening was 
used to define the perceived instructor 
threat variable. In effect, this procedure 
defined perceived instructor threat as an 
attribute of the individual student. 

Complete data were available for 190 
students (61 males and 59 females) who 
regularly attended the classes? The median 
Threat scale score for these students was 34; 
9 students with scores at the median were 
eliminated in order to provide a clear sepa- 
ration between high- and low-threat groups. 
Students with scores of 33 and below on the 
Threat scale were designated the high-threat 

group (31 males and 23 females); students 
with scores of 35 and above were designated 
the low-threat group (26 males and 31 fe- 
males). The high- and low-threat groups 
were further divided on the basis of their 
median A-Trait and C-Trait scale Scores, 
which were 35.5 and 47.5, respectively. 
Students above these medians were desig- 
nated as high anxious (high A-trait) or high 
curious (high C-trait), and those below were 
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categorized as low anxious (low A-trait) or 
low curious (low C-trait). 

The relations between trait anxiety and 
perceived instructor threat and between trait 
curiosity and perceived instructor threat 
were evaluated in a chi-square analysis, 
which revealed that neither anxiety nor cu- 
riosity were related to perceptions of in- 
structor threat. Pearson product-moment 
correlations indicated that there was a slight 
but statistically significant negative corre- 
lation between perceived instructor threat 
and curiosity (r = —.17; p < .05). 

The finding that high-trait anxious stu- 
dents perceived their instructors as no more 
threatening than low-trait anxious students 
was surprising. However, all four instruc- 
tors were rated as relatively nonthreatening, 
and the classroom situation may not have ` 
been sufficiently stressful to elicit high- 
threat perceptions by the anxious stu- 
dents. 

The hypotheses of the present study were 
tested by examining the effects of anxiety, 
curiosity, and perceived instructor threat on 
student verbal expressions. A total of 568 
expressions were recorded in the 32 obser- 
vation sessions, for an average of 17.7 stu- 
dent verbalizations per each 1-hour session. 
I-type verbalizations accounted for 73% of 
total student expressions, while R-type 
verbal responses accounted for the remain- 
ing 27%. The overall interrater reliability 
(objectivity) for Ltype and R-type responses 
was .92 and -98, respectively. Disagreements 
between observers were resolved by taking 
the mean of their ratings for each re- 
sponse. 

The means and standard deviations for 
I-type verbal behavior scores for males and 
females during Period 1 (first four sessions) 
and Period 2 (last four sessions) are pre- 
sented in Table 1 for all possible combina- 
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* Table 1 
Means and Standard Deviations for I-type Verbal Responses Emitted in Periods 1 and 2 as a 


Function of A-Trait, C-Trait, and Threat for Males and for Females 


Low threat. High threat 
Low A-trait High A-trait Low A-trait High A-trait 
C-trait 1 2 1 2 i 2 1 2 
Males 
High 
A M 3.81 6.81 6.66 7.00 2.77 3.88 00 00 
SD 4.02 10.62 3.78 2.64 4.91 5.96 00 00 
Low 
M 1.00 50 87 1.50 1.57 1.57 76 1.38 
SD 81 57 1.45 2.82 2.29 2.93 1.73 2.72 
Females 
High 
M 2.30 3.20 2.33 231 70 60. 15 25 
SD 3.12 4.93 3.80 2.97 1.33 1.07 95 50. 
Low 
M 50 00 37 37 00 25 20 20 
M SD 1.00 00 14 1.06 00 50 44 44 


Note. I-type responses were student-initiated expressions, including procedural and course-content-related questions as well 
as declarative statements. A-trait refers to the students’ scores on the "Trait Anxiety scale of the State- Trait Anxiety Inventory. 
C-trait refers to the students’ scores on the Trait Curiosity scale of the State-Trait Curiosity Inventory. 


tions of A-trait, C-trait, and threat. The For both males and females, high C-trait 
verbal behavior data were evaluated bya2 students gave substantially more responses 
X 2X 2X 2X 2 analysis of variance for re- than low C-trait students. Furthermore, the 
peated measures in which A-trait, C-trait, differences between high and low C-trait 
threat, and sex were between-subjects vari- students of both sexes were much greater in 
= ables, and periods was the within-subjects the low-threat condition. In the high-threat 
variable, A least-squares estimate analysis condition, high C-trait males gave more than 
(Winer, 1971, as modified by Sampson, 1975) ` twice as many responses as low C-trait males, 
was required because of disproportionate cell whereas females gave very few responses ir- 


e moderate respective of their level of C-trait. 


frequencies resulting from th t, 
negative correlation between A-trait and The means and standard deviations for 


C-trait (r = —.54) in this study. Subsequent R-type verbal behavior scores are presented 
t-significant differ- in Table 2. In general, the response rates for 


tests employed the leas 
ence measure (Li, 1969), using a two-tailed females were somewhat lower than for males. 
valuated by the same 


probability distribution. When these data were e 
In the analysis of I-type verbal behavior, type of analysis that was employed for I-type 
the most important finding was the signifi- verbal behavior, the only significant finding 
cant C-Trait X Threat interaction, F(1,95) was the four-way interaction of A-Trait X 
=5.27,p <.05. The C-trait, F(1, 95) = 7.99, Threat X Sex X Periods, F(1, 95) = 6.15, p 
p « Qi threat, F(1,95) = 4.62, p <.05,and <C tah 
sex, F(1, 95) = 5.22, p < -05, main effects Given the complexity of this interaction 
were also significant. for R-type verbal behavior, separate analyses 
Given the significant sex main effect, the were subsequently performed for males and 
C-Trait X Threat interaction is presented females. For the males, the A-Trait X 
separately for males and females in Figure Threat X Periods interaction, F(1, 95) = 
1. Although the mean scores for males were 5.01,p < 05, was the only significant finding. 
substantially higher than for the corre- There were no Sig) nding 
sponding female groups, the form of the in- males. Thus, the only contribution of the 


t , teraction was quite similar for both sexes. females to the four-way interaction appears 
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Mean I-type verbal responses for high and low C-trait males and females as related to threat. 


(I-type responses were student-initiated expressions, including procedural and course-content-related 
questions as well as declarative statements. C-trait refers to the students’ scores on the Trait Curiosity 


scale of the State-Trait Curiosity Inventory.) 


to be their low overall level of R-type verbal 
behavior. 

The significant A-Trait X Threat X Peri- 
ods interaction for the male students is 
presented in Figure 2. In general, males who 
perceived their instructors as high threat 
tended to respond at a lower rate than did 
the males who perceived their instructors as 
low threat, irrespective of their A-Trait scale 
scores. Both low and high A-trait males who 
perceived their instructors as highly 
threatening responded at low rates during 
Period 1, but low A-trait males showed an 
increase in R-type responses as the course 
progressed, whereas the response rate for the 
high-A-trait-high-threat males remained 
low. In contrast, the high-A-trait-low- 

threat males tended to show an increase in 
R-type responses as the course progressed, 


whereas the response rate for the low-A- 
trait-low-threat males tended to decrease. 


Discussion 


In the present study, students who rated 
their instructors as threatening gave sub- 
stantially fewer verbal responses than did 
students who rated instructors as non- 
threatening. For students who perceived 
their instructors as nonthreatening, those 
with high trait curiosity initiated more than 
five times as many questions and declarative 
statements (I-type responses) than did 
low-C-trait-low-threat students. In con- 
trast, when the student rated the instructor 
as threatening, the response rates for stu- 
dent-initiated verbal behaviors were low, 
regardless of level of trait curiosity. Thus, 
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; Table 2 
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Means and Standard Deviations for R-type Verbal Responses Emitted in Periods 1 and 2 as a 
Males and for Females 


eai 
Function of A-Trait, C-Trait, and Threat for f 


Low threat High threat 
Low A-trait High A-trait Low A-trait High A-trait 
C-trait 1 2 1 2 1 2 1 2 
Males 
High 
a M 1.90 1.63 1.33 3.00 55 1.33 50 00 
SD 4.30 3.82 2.30 2.64 .88 2.69 .70 00 
Low 
M 45 .00 1.25 1.37 44 ht A5 .30 
SD 1.50 .00 3.53 3.15 .97 Nb 37 63 
Females 
High 
M 60 1.80 .55 33 10 .20 .25 50 
SD 1.26 2.82 1.01 50 31 A2 50 57 
Low 
M 15 25 50 12 00 00 00 00 
* SD 50 .50 00 .00 .00 00 


Note. 


of the State-Trait Curiosity Inventory. 


student-initiated verbal behavior for the 
highly curious students is apparently in- 
hibited by perceptions of the instructor as 
threatening. Since students who were low 
in trait curiosity displayed low rates of 
question-asking behavior regardless of level 
of perceived instructor threat, it would ap- 
pear that low-curious students have little 
motivation to explore their environments 
through asking questions. 

The question-asking behavior of college 
students was found by Evans (1971) to be 
positively correlated with curiosity. The 
results of the present study suggested, 
however, that this relationship holds only for 
students who perceive their instructors as 
nonthreatening. For those who perceived 
their instructors as threatening, there were 
no differences between students who were 
high and low in C-trait. While these pat- 
terns of student-initiated verbal behaviors 
were similar for both sexes, males consis- 
tently initiated more questions and declar- 
ative statements than females, as may be 
noted in Figure 1. 

Whereas curiosity interacted with per- 
ceived instructor threat to influence stu- 
dent-initiated verbal responses, trait anxiety 
interacted with perceived instructor threat 
zu to influence responses to questions asked by 


R-type responses were voluntary responses by students to 
on the Trait Anxiety scale of the State- Trait Anxiety Inventory. 


the instructors’ questions. A-trait refers to the students’ scores 
C-trait refers to the students’ scores on the Trait Curiosity scale 


the instructor. However, this interaction 
was found for males but not for females and 
involved differential changes over time. 
The females answered so few questions that 
it was not possible for the personality vari- 
ables to influence their response rates. 

As may be noted in Figure 2, the high- 
A-trait-high-threat males answered very few 
instructor questions, either at the beginning 
or at the end of the observation period. The 
response rates for the low-A-trait-high- 
threat males were initially quite similar to 
their high A-trait counterparts, but these 
students gave substantially more answers to 


progressed. 
structors as nonthreatening answered sub- 
stantially more questions than did males 
who rated their instructors as threatening, 
and high-A-trait-low-threat males tended 
to respond to fewer questions than did 
low-A-trait-low-threat males at the begin- 
ning of the observation period. However, 
with the passage of time, the high-A-trait- 
low-threat students answered more in- 
structor questions, whereas the number of 
questions answered by students in the low- 
A-trait-low-threat group decreased as the 
course progressed. 

The low rate of responding to instructor 
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Figure2. Mean R-type verbal responses given in Pe- 
riods 1 and 2 for high and low A-trait males as related 
tothreat. (R-type responses were voluntary responses 
by students to the instructors" questions. A-trait refers 
to the students' scores on the Trait. Anxiety scale of the 
State- Trait Anxiety Inventory.) 


questions that characterized the verbal be- 
havior of the high-A-trait-high-threat group 
was consistent with trait-state anxiety 
theory (Spielberger, 1966, 1972). As ex- 
pected, the combined influence of perceived 
instructor threat and trait anxiety, that is, 
the tendency to perceive evaluative situa- 
tions as dangerous, inhibited the question- 
answering behavior of these students 
throughout the period of observation. 
While the low-A-trait-high-threat males 
were initially inhibited by their high level of 
perceived threat, these students apparently 
adapted to the evaluative situation, and their 
question-answering behavior increased from 
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the initial to the final observation period, - 

It would appear that males who perceived 
their instructors as nonthreatening viewed 
the classroom situation as relatively safe and 
responded to instructor questions at a high 
rate throughout the observation period. 
The increased rate of responding for the 
high-A-trait-low-threat males suggested 
that these students adapted to the novel 
situation as the course progressed, and their 
lower levels of apprehension permitted in- 
creasingly greater class participation. On 
the other hand, the low-A-trait-low-threat 
males, who were initially the most produc- 
tive in answering more questions than any 
other group, appeared to lose interest as the 
course progressed, as reflected in decreasing 
class participation over time. 

In the present study, the finding that 
males who perceived their instructors as 
nonthreatening answered more questions 
than males who perceived their instructors 
as threatening was consistent with Kelly's 
(1950) finding of greater student classroom 
participation with perceptions of warm 
rather than cold instructors. Furthermore, 
the results for students who perceived their 
instructors as threatening were also consis- 
tent with Williams's (1971) finding of less 
classroom participation for high-anxious 
(neurotic) students than for low-anxious 
(well-adjusted) students. 

The differential impact of anxiety and 
curiosity on student verbal behavior in this 
study suggests that asking or answering 
questions in the classroom may be governed 
by different needs. Asking questions ap- 
parently satisfies a “need to know,” that is, 
the curiosity drive, but curiosity has little 
effect upon the motivation to answer ques- 
tions posed by others. In contrast, re- 
sponding to an instructor’s questions in an 
evaluative situation is apparently governed 
more by trait anxiety than by trait curiosity. 
According to trait-state anxiety theory 
(Spielberger, 1966, 1972), persons who are 
high in trait anxiety are more prone to per- 
ceive evaluative situations as threatening 
and to respond to such situations with ele- 
vations in state anxiety, that is, with feelings 
of tension and apprehension that inhibit 
their responses to the instructor's ques- 
tions. 
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ANXIETY AND CURIOSITY IN THE CLASSROOM 


The finding that males gave many more 
responses than females was not expected. 
The research literature offers conflicting 
data with respect to sex differences in stu- 


dents’ question-asking behavior. Some 
studies (e.g., Davis, 1932; Smith, 1933) reveal 


that the frequency of types of questions 
differs for boys and girls, while other inves- 
tigators (e.g. Meyer & Shane, 1973) report 
no sex differences in the frequency of ques- 
tion-asking behavior. Furthermore, this 
literature is based primarily on the responses 
of preschool and elementary school children, 
so that generalization to the present subject 
population is questionable. 

A possible explanation of the finding in 
the present study that males asked and an- 
swered more questions than females may be 
related to the fact that all of the instructors 
were male. This may have exerted an in- 
hibiting effect on the female students. 

In conclusion, the results of this study 
would seem to demonstrate that personality 
variables interact with perceptions of in- 
structor threat to influence the questions 
asked by college students and the answers 
that they volunteer in response to instructor 
questions. It would appear that the verbal 
behavior of students depends on their per- 
ceptions of their instructors as more or less 
threatening. Furthermore, perceived in- 
structor threat interacts with curiosity in 
determining the question-asking behavior of 
males and females and with anxiety in de- 
termining male students’ responses to 
questions asked by their instructors. 
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Correlates of Social Statu 


Mentally Retarded Children 
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The relative contributions of misbehavior, academic incompetence, and expo- 
sure to nonretarded children to the explanation of retarded children's socio- 
metric status were explored. Teachers and peers rated retarded children on the 
dimensions of misbehavior and academic performance. The results indicated 
that perceived academic incompetence was associated with educable mentally 
retarded children's level of social acceptance, whereas perceived misbehavior 
was associated with retarded children's social rejection by peers. Amount of 
exposure to nonretarded children did not relate significantly to retarded chil- 


dren’s social status. The data were di 


underlying the mainstreaming of retarded children into regular classes. 


One of the central propositions upon 
which the mainstreaming movement flour- 
ishes is the belief that by removing educable 
mentally retarded (EMR) children from 
special classes and placing them in the reg- 
ular grades, the stigma that accompanied 
their segregated placement would be re- 
duced and their social acceptability to their 
non-EMR peers would be improved (e.g., 
Christoplos & Renz, 1969; Fischer & Rizzo, 
1974). However, investigations that have 
examined the comparative social position of 
EMR children in segregated and regular 
classes have failed to substantiate the as- 
sertions regarding the ability of regular class 
placement to enhance the social status of 
EMR children (Goodman, Gottlieb, & Har- 
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iscussed in terms of the assumptions 


rison, 1972; Gottlieb & Budoff, 1973; Iano et 
al, 1974). These investigations have gen- 
erally indicated that integrated EMR chil- 
dren are not as well accepted as EMR chil- 
dren who remain in segregated programs. 
A major limitation of previous investiga- 
tions has been that they shed little light on 
the reasons why EMR children have tradi- 
tionally occupied an inferior social status 
within their peer group. Given the abun- 
dant empirical evidence that EMR children 
are not well accepted socially, regardless of 
their class placement (see Gottlieb, 1975b, 
for a review), a detailed understanding of the 
specific reasons why they are not accepted 
is essential if we are to develop successful 
Strategies to improve their social status. 
The few attempts to improve the social sta- | 
tus of EMR children have been generally 
unsuccessful in securing durable improve- 
ments (Lilly, 1971; Rucker & Vincenzo, 
1970), in part because the investigations did 
not consider the reasons for the EMR chil- 
dren's initial nonacceptance. The present 
investigation was initiated in order to study 
possible influencing factors in the social 
status of EMR children. EMR children's 
perceived social behavior, their academic 
competence, and the amount of time they 
were integrated into regular grades were 
studied to determine their relative influence 


» 


CORRELATES OF SOCIAL STATUS 
. on the social acceptability of EMR chil- 


dren. 

There is little doubt that the observable 
behavior one child manifests in the presence 
of others affects his social status; what is in 
doubt is the influence of different types of 
behavior on the EMR child's social status. 
Basically, two broad categories of behavioral 
expression have been found to relate signif- 
icantly to EMR children's social status: 
academic ability (e-g-, Gottlieb, 1974; Sip- 
erstein, Bak, & Gottlieb, 1977) and misbe- 
havior (Baldwin, 1958; Gottlieb, 1975a; 
Johnson, 1950). Since academic ability and 
misbehavior have seldom been studied 
concurrently in a single investigation, we do 
not know whether the two categories of be- 
havior interact to affect social status, nor 
whether one of the two categories is more 
closely related than the other. This infor- 
mation has implications for designing 
strategies to improve EMR children’s social 
status; if social status is more closely asso- 
ciated with one behavioral category than 
another, interventions aimed at improving 
social status may only have to concentrate on 
modifying the most highly correlated be- 
havior. Behavior that has a low correlation 
with social status may require no modifica- 
tion, since the improved behavior may have 
no influence on social status. 

In the present investigation, perceptions 
of EMR children’s behavior were obtained 
both from non-EMR peers, who provided 
the social choice data that served as depen- 
dent measures, and from the EMR pupils’ 
teachers. This procedure was employed not 
only to validate peers’ responses to the be- 
havior of retarded classmates but also to 
determine the relative power of teachers’ and 
non-EMR peers’ perceptions to predict 
EMR pupils’ social status. While some data 
do exist that suggest that teachers and peers 
have similar preferences regarding children 
(Gronlund, 1953), the investigators are un- 
aware of any studies that have examined 
whether teachers and peers share similar 
perceptions regarding the specific behavior 
of children nor how these perceptions relate 
to social status. 

Of parallel concern with the identification 
of behavior that is most highly related to 
social status is the nature of the relationship 
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at different levels of social status. For ex- 
ample, if positive social behavior is associ- 
ated with high social status, is the lack of 
positive social behavior necessary and suf- 
ficient to cause low social status? Available 
evidence suggests that the absence of posi- 
tive social behavior may be a necessary pre- 
cursor to low social status, but that it is not 
sufficient. In a study of non-EMR junior 
high school pupils, Gronlund and Anderson 
(1957) found that while socially accepted 
children were nominated as possessing pos- 
itive social behavior, rejected children were 
not only nominated as lacking these positive 
behaviors but, in addition, were perceived to 
manifest negative behavior, such as being 
talkative and restless. In other words, social 
rejection was related to the expression of 
negative behavior and not to simply the ab- 
sence of positive behavior. Such data indi- 
cate that social acceptance and social rejec- 
tion may not represent a single continuum 
but may reflect two relatively distinct di- 
mensions. Additional evidence supporting 
the need to consider acceptance and rejec- 
tion separately is available from Bryan (Note 
1), who reported that the correlation be- 
tween acceptance and rejection among 
learning-disabled children was only —.20, 
and from Trent (1957), who found that 
among delinquent adolescent boys, there was 
a significant positive correlation between 
being liked and disliked; boys who were liked 
more frequently were also disliked more 
frequently. These data do not support the 
assumption that the more groups of subjects 
are liked, the less they are disliked. If ac- 


derson (1957) data would generate the pre- 
diction that the perceived misbehavior of 
children would be associated with social re- 
jection but not necessarily low social accep- 
tance. 

Similar predictions cannot be advanced 
for the relationship between academic per- 
formance and social rejection, since the data 
that has typically related academic perfor- 
mance and social status has been primarily 
concerned with social acceptance not social 
rejection. The majority of data suggests 
that academically incompetent children re- 
ceive few social acceptance choices, but little 
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information is available regarding the fre- 
quency of active social rejection nomina- 
tions. Johnson’s (1950) data suggest, how- 
ever, that academic incompetence is not as- 
sociated with active social rejection. 

Among school children, the behavior that 
one child manifests in the presence of others 
operates in the context of a particular edu- 
cational program and environment. The 
social status of EMR children in a peer group 
comprised mainly of nonretarded children 
is determined in large part by the opportu- 
nities that group members have to observe 
the behavior of the EMR child. Two com- 

peting hypotheses can be raised regarding 
the effects of observed behavior on social 
status. The first is that the more opportu- 
nity non-EMR children have to observe the 
behavior of mentally retarded classmates, 
the more likely they will be to base their so- 
ciometric ratings of them on information 
gained through their observations. There- 
fore, the more times that non-EMR children 
see the EMR child displaying socially ap- 
propriate behavior, the more likely they will 
be to rate the child positively. On the other 
hand, the more undesirable behavior non- 
EMR children observe, the more likely they 
will be to rate the EMR child unfavorably. 
An alternative hypothesis can be ad- 
vanced that the amount of time non-EMR 
children observe retarded pupils will not be 
related to their ratings. This hypothesis 
derives from literature indicating that people 
attribute a wide variety of characteristics to 
others on the basis of their first impressions 
and do not require much interaction with 
others before they are able to construct 
elaborate and durable conceptions about 
them (e.g., Kleck, Richardson, & Ronald, 
1974). To the extent that non-EMR chil- 
dren’s sociometric ratings are primarily in- 
fluenced by their initial impressions of EMR 
children, opportunities for additional con- 
tact with EMR children would be expected 
to affect children’s ratings of retarded 
classmates only minimally. 

To summarize, the present investigation 
was concerned with explaining why EMR 
children occupy a low social status. The 
main hypothesis tested was that the per- 
ceived behavior of EMR children would 
significantly affect their social status scores, 
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with perceived misbehavior being m 
closely associated with social rejection 
academic incompetence would be. Nos 
cific prediction was advanced regarding 
effect on social status of the amount of tim 
that EMR children were integrated. 


Method 


Subjects! 


The subjects who participated in this investigation 
were 324 EMR children (60% male and 40% female) who 
had been integrated by chronological age into third-, 
fourth-, or fifth-grade regular classes for differing 
amounts of time. Approximately 70% of the subjects 
had previously attended special classes. Pupils were 
sampled from 152 school buildings located in 42 school 
districts or cooperatives. Subjects were not randomly ` 
assigned to amount of time integrated but were selected: 
from ongoing programs in intact classrooms. 

The EMR subjects in this investigation ranged in age 
from 8 to 15 years. The mean chronological age of this 
sample was 10.7 years. Their racial distribution was as 
follows: 34% black, 45% Chicano, and 21% white. 
Teachers estimated that 71.3% of the EMR sample 
came from low socioeconomic status homes, 21.1% from 
lower-middle-class backgrounds, with the remainder 
(7.6%) coming from middle-class and upper-middle- 
class backgrounds. 

The 324 subjects represented a subsample of 723 
EMR pupils who comprised the total EMR population 
of a larger study directed toward the investigation of the 
effects of mainstreaming. Of the 723 subjects, 267 were 
enrolled in special classes with less than 16 children and : 
were therefore excluded from this study. Of the re- 
maining 456 EMR pupils, the sample of 324 constituted. 
all pupils in regular classes for whom complete data files 
were available. 


Test Procedures 


Dependent measures: Acceptance and rejection. 
The dependent measures were derived from the near- 
Sociometric scale How I Feel Toward Others (Agard & 
Harrison, Note 3), which was administered during a 
single session to entire classrooms of children including 
the EMR pupils. Each pupil was presented with a list. 
of names of his classmates. To the right of each name 
appeared four figures: a smiling face, a straight- 
mouthed face, a frowning face, and a question mark. 
Pupils were asked to color in the one of the four figures 
that best described the way they felt about each child 
in their class. The exact instructions were as follows: 


! 1 The subjects who Participated in this study were 
drawn from samples used in Project PRIME (Kaufman, 
Agard, & Semmel, Note 2). The authors express their 


appreciation to these investigators for permission to use 
the data. 1 


CORRELATES OF SOCIAL STATUS 


The face with the smile stands for the classmates 
who are your friends. You will color in the smiling 
face only after the names of those children who are 
your friends. 


The face with the straight mouth stands for your 
classmates that you know pretty well but whom you 
don’t especially care about. You will color in the 
face with the straight mouth after the names of 
these children. 


The face with the frown stands for children you do 
not want to have as friends as long as they are like 
they are now. These children may be all right in 
some ways and may be good friends with other chil- 
dren but not with you. ‘You will color in the frown- 
ing face to the right of the names of the children 
who are not your friends. 


The circle with the question mark stands for your 
classmates you don’t know very well. It may be 
that you have not been with them enough to tell 
much about them. You will color in the circle with 
the question mark after the names of those children 
whom you do not know very well. 


Four scales were derived from these data (Veldman 
& Sheffield, Note 4), two of which are repo! here. 
The first scale was Social Acceptance (percentage of 
smiles received) and was calculated as the number o 
smiling responses the EMR pupil received from his 
classmates divided by the total number of responses he 
received to all four figures. "The second scale was i 
Rejection (percentage of frowns received), which was 
calculated as the number of frown choices divided by 
the total number of choices. Reliability estimates of 
the Social Acceptance and Social Rejection scales were 
16 and .74, respectively, and were obtained from the 
correlation of split-class scores. i i 
regarding the reliability and validity of the Social Ac- 
ceptance and Social Rejection scales is available else- 
where (Agard, Veldman, Kaufman, & Semmel, in 
press).- 

Independent variables: Peers’ perceptions of cog- 
nitive ability and disruptive behavior. Peers’ an 
teachers’ perceptions of EMR pupils’ behavior were 
used as independent variables. Peer perceptions were 
obtained from responses to the Guess Who? question- 
naire, which required every child in a class to name one 
classmate who best typified the behavior expressed in 
each questionnaire item. 

Factor analysis of the 29-item instrument revealed 
four factors, two of which are of concern to this report. 


The two factors, consisting of 10 and 5 items, respec- 


tively, were labeled Disruptive and Dull. number 
of items on which an EMR child was chosen by at least 


one peer as disruptive or dull was his score on that fac- 
tor. The items comprising the Disruptive and 
factors appear in Table 1. The reliability (alpha) 
coefficients for the Disruptive and Dull scales were 92 
and .85, respectively- ‘Additional information i 
the construction of the Guess Who? scales, including 
validating data, is available in a paper by Veldman and 
Sheffield (in press), in which the authors compared 
various methods of scoring nomination data and dem- 
onstrated the superiority of the binary me' 


* here. 
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"Table 1 
Peer Nomination Scales for Disruptive and 
Dull Factors 


Disruptive 
Who does the teacher have to scold all the time? 
Who is always bothering other children? 
Who makes too much noise in class? 
Who breaks the rules? 
Who bothers the teacher all the time? 
Who makes fun of other children? 
Who always wants their own way? 
Who gets into a lot of fights? 
Who likes to boss others around? 
Who does not work well with others? 


Dull 
Who is the worst in math? 
Who never knows the answers in class? 
Who is the worst in reading? 
Who learns new things very slowly? 
Who never gets their school work done on time? 


Teachers" perceptions of cognitive ability and mis- 
behavior. Teachers’ perceptions of the EMR children’s 
behavior were obtained from the Teacher Rating Scale, 
an 85-item instrument, to which all teachers who came 
in contact with the retarded child responded. Each 
item was scored on a 5-point basis (from always to 
never). A pupil’s score on each item was derived by 
from all the teachers’ ratings. 
analysis of this instrument resulted in the 
emergence of four factors, two of which were concept- 
ually similar to factors that emerged from the factor 
analysis of the Guess Who? scale and were therefore 


Table 2 y 
items Comprising Misbehavior and Academic 


Concentration Scales from Teacher Ratings 


Academic Concentration 
Who needs constant supervision to complete school 
work? 
Who finishes work on time? $ 
o can concentrate on tasks for long periods? 
Who has difficulty keeping mi d on school work? 
Who receives better than average grades on his or her 


Misbehavior 


Who attempts to dominate or bully other children? 
Who is well-behaved in school? 

Who is boisterous, shows off? 

Who resists class limits or rules? 

Who gets into fights? 
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sistency for this scale was also .98. Selected items for 
each scale appear in Table 2, Additional validity and 
reliability data for the Teacher Rating Scale appear in 
a technical manual (Agard, Veldman, Kaufman, Sem- 
mel, & Walters, in press). 

Scores for the Misbehavior scale had a range of 33 to 
165. The 40-item Academic Concentration scale had 
a range in scores of 40 to 200. The higher score was 
associated with the scale name on all instruments. 
Additional data on scale construction procedures have 
been presented by Veldman (Note 5). 

Hours of integration. Information pertaining to the 
amount of time that EMR pupils were integrated into 
regular classes was obtained principally from the 
teacher who was most familiar with the child's schedule. 
Teachers were asked the number of hours each week 
that the EMR child spent in all subject areas with reg- 
ular class children. The number of hours integrated 
each week for academic subject matter areas was used 
as the measure of interest. 


Statistical Procedures 


In order to determine the variance that teachers’ and 
peers’ perceptions contributed to the social status of the 
EMR children, Separate commonality analyses were 
computed for the social acceptance (percentage smiles 


ceived) dependent measures, Commonality analysis 
is an extension of multiple regression techniques, which 


tor sou Commonality 
analysis identifies the criterion variance associated with 


ings (teachers and Peers) and the type of behavior per- 
ceived (academic ability and misbehavior). 

The commonality analyses employed in this inves- 
tigation included five independent variables: (a) peers’ 
perceptions of disruptive behavior and (b) cognitive 
ability (dull); (c) teachers’ perceptions of misbehavior 
and (d) academic concentration; and (e) the hours of 
academic integration, a set of two variables representing 
the linear and quadratic components of that variable, 
The quadratic component of the integration variable 
was included because prior research indicated that so- 


Table 3 


Means and Standard Deviations for Teachers’ and Peers’ Perceptions and 
Amount of Time Integrated 
Teacher rating Peer rating - Hours Hours 
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amount of variance in the criterion, that variable was 
omitted from the full equation; the decrease in R2 is the 
percentage of variance explained by the deleted vari. 
able. The multiple regression equations contained the 
same variables as the commonality analyses but we: 
computed because of the difficulty in interpreting in: 
teraction terms in commonality analysis. 


Results 


included teachers’ perceptions of EMR 
children’s cognitive competence and mis- 
behavior and peers’ ratings of EMR chil- 
dren’s cognitive incompetence and misbe- 
havior. The integration variable included 
the linear and quadratic components of the 
amount of time integrated in academics. 
Means and standard deviations for the rel- 
evant variables appear in Table 3. The 
corresponding intercorrelation matrix ap- 
pears in Table 4, 

As is evident from Table 4, teachers’ and 
Peers’ perceptions of EMR pupils’ misbe- 
havior and cognitive ability were signifi- 
cantly correlated. Teachers’ and peers’ 
tatings of EMR children’s misbehavior cor- 


Statistic Cognitive Misbehavior Cognitive Misbehavior integrated squ; 


M 99.59 
SD 20.21 


75.01 
19.17 


2.94 
1.67 


3.79 
2.93 


ared Acceptance Rejection 


38.38 
18.09 


13.44 
5.81 


214.26 
161.12 


26.20 
16.60 


^ acceptance, accounting 
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| Table 4 
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Intercorrelation Matrix of Variables of Teachers’ and Peers’ Perceptions and 


Amount of Time Inte, rated 


Variable 1 2 3 4 5 6 7 8 
1, Teacher rating cognitive NS Ee eA rui 85, —30 
2. Teacher rating misbehavior — 48 57 -04 -0 3-20 AL 
3. Peer rating cognitive — .30 45 .12 —.28 .20 
4. Peer rating misbehavior — 20012 —04  —J1 32 
5. Hours integrated — 97 01 —.04 
6. Hours squared — 04  —04 
7. Acceptance — —.67 
8. Rejection =? 
Note. N = 324. 


related .57, while their ratings of EMR pu- 
pils’ cognitive ability correlated —.39, both 
coefficients being statistically significant (p 
< .001)2 These can be taken as validity 
coefficients for perceptions of teachers and 
peers. 

The five independent variables together 
explained 15.796 of the variance in social 
acceptance. Teachers' perceptions of the 
EMR pupils’ cognitive competence (aca- 
demic concentration) contributed 4.6% 
unique variance (p < .0002) in the depen- 
dent measure. Peers’ nominations of 
children’s cognitive incompetence (dull) was 
the only other significant predictor of social 
for 2.2% unique 
variance (p < .005). Including the variance 
shared by teachers' and peers’ ratings, E 
children’s perceived cognitive ability ac- 
counted for 10.4% of the variance in social 
acceptance. Neither teachers’ nor peers’ 
ratings of EMR children’s misbehavior 
(misbehavior and disruptive, respectively) 
accounted for significant amounts of vari- 
ance in the social acceptance criterion. 
Numbers of hours integrated also failed to 
account for significant amounts of variance 
in social acceptance. 


“The second commonality analysis was 


performed on the criterion of social rejection 
(percentage of frowns received). The 
identical predictors were entered into this 
equation as were entered into the previous 
one. The full equation explained 18.1% of 
the variance in social rejection. Teachers’ 
ratings of misbehavior predicted 2.2% unique 
variance in the social rejection criterion 

< .004), while their ratings of EMR chil- 


"v dren's cognitive ability accounted for 1.696 


unique variance (p < .02). Including the 
variance shared by teachers’ and peers’ rat- 
ings, misbehavior as a set accounted for 7.9% 
of the variance in social rejection. Peers 
ratings of EMR pupils’ misbehavior also 
contributed a significant amount of variance 
(1.4%, p < .02), while their rating on the 
cognitive scale did not (.3%). Amount of 
time integrated did not account for signifi- 
cant amounts of variance in the social re- 
jection criterion. 

The commonality analyses indicate that, 
overall, EMR children’s misbehavior as 
perceived by teachers and peers was a better 
predictor of rejection than was perceived 
cognitive ability. On the other hand, per- 
ceived cognitive ability was a better predic- 
tor than was perceived misbehavior of social 
acceptance scores. The data also revealed 
that in all instances where teachers and peers 
rated along a similar dimension (cognitive or 
behavioral), the teachers’ ratings predicted 
more variance in social status even though 
social status scores Were obtained from peer 
judgments. A summary of the commonality 
analyses appears in Table 5. 

Two multiple regression equations were 
solved in order to determine whether 
teachers’ and/or peers’ ratings of EMR 
children’s behavioral and cognitive ability 
interacted with each other and with the 
amount of time the latter were integrated in 
regular classes. The first equation was 
computed for the acceptance criterion, while 


—— 

2 The scaling for teachers’ and peers’ perceptions is 
reversed. A high score for peers" perceptions implies 
incompetent ability, while a high score for teachers’ 
perceptions implies competent ability. 
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Commonality Analysis for Unique and Shared? Variances Among Peer and Teacher 


Nominations and Social Status 


Unique 


Source variance p 
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Unique 


Source variance p 


Social Acceptance (Ri? = 15.7%) 


Academic Concentration (A) 4.696 .0002 

Misbehavior (B) .396 ns 

Dull (C) 2.296 .005 

Disruptive (D) .196 ns 

Integration (E) 1% ns 
, 2.296 

(A, C) 3.696 

(A, D) -296 

(B, C) —.2% 

(C, E) 2% 

(A, B, C) —.2% 

(A, C, D) —.3% 

(A, C, E) 4% 

(B, C, D) -2% 

(A, B, C, D) 1.5% 


Note. Only sources contributi: 
summary table. 
“Shared variances appear in parentheses, 


the second was employed for the social re- 
jection criterion. The multiple regression 
equations included all predictor variables 
that were entered into the commonality 
analyses as well as interaction terms between 
teacher and peer nominations and hours 
integrated. No significant interactions 
emerged from either regression equation. 
"That is, when the variance attributable to 
the interaction between number of hours 
integrated and the set of four teacher and 
peer nomination variables was dropped from 
the full multiple regression model, a signif- 
icant reduction of explained variance did not 
occur. 


Discussion 


The results of this investigation did not 
support the contention that academic inte- 
gration per se significantly affects the social 
status of EMR children; teachers’ and peers’ 
perceptions of EMR children’s behavior 
were more important as factors related to 
social status. Furthermore, the amount of 
time that teachers and peers were able to 
observe the behavior of EMR children did 
not significantly relate to their social sta- 
tus. 


Social Rejection (Rota? = 

Academic Concentration (A) 1.6% .02 
Misbehavior (B) 2.296 .004 
Dull (C) .396 ns 
Disruptive (D) 1.496 .02 
Integration (E) .196 ns 
(A, B) 2.696 

(A, C) 9% 

(A, D) —.396 

(B, C) —.296 

(B, D) 4.396 

(C, D) .696 

(A, B, C) 2% 

(A, B, D) 1.8% 

(A, C, D) .396 

(B, C, D) .396 

(A, B, C, D) 2.196 


ing 196 or more of the explained variance (unique variance + total variance) are included in the 


Previous discussions of EMR children's 
inferior social status in segregated special 
classes have implicitly suggested that their 
social status would improve when they were 
removed from the Segregated classes and 
reintegrated into the regular grades (e.g., 
Dunn, 1968). The logical extension of such 
an argument is that EMR children's social 
status should improve as they spend in- 
creasing amounts of time in the regular 
classes. This was not found to be the case in 
the present. investigation. The zero-order 
correlation between number of hours inte- 
grated and social acceptance was .008, while 
the corresponding correlation coefficient 
between integration and social rejection was 
—.036, both coefficients being nonsignifi- 
cant. 
. The fact that amount of academic time 
integrated did not relate significantly to so- 
cial status could be explained in a number of 
ways. First, the data for this investigation 
were collected during January, 5 months 
after bL ed year had begun. Second, the 


the same classroom, for an average of 2.7 
hours per day, may have provided the non- 


1 


"elaborate portraits of people. 
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EMR observers with a sufficient sampling of 
the EMR children's behavior on which to 
base their ratings. Support for this view is 
offered by a large body of literature that has 
indicated that people judge others on the 
basis of very limited information and that 
these first impressions are fairly resistant to 
change (e.g., Kleck et al., 1974). Thelimited 
information derived from these brief en- 
counters may then be used to construct 
Tn the context 
of the present study, the judgments that 
teachers and peers make about EMR chil- 
dren may take place relatively soon after the 
first encounter, and most additional expe- 
riences may be superfluous. A third possible 
explanation for the lack of a significant 
relation between amount of time integrated 
and social status may result from the fact 
that not all of the mentally retarded children 
were sent to regular classes at the same time. 
Approximately one third of the EMR pupils 
had been enrolled in regular classes during 
the year prior to the data collection. It is 
therefore conceivable that the overall length 
of time for which EMR children were in 
regular classes could have canceled out any 
effects attributable to the weekly amount of 
time integrated. This explanation is remote, 
however, because the zero-order correlation 
between the EMR children's instructional 
setting during the previous year (measured 
on a 5-point continuum from special educa- 
tion all day to regular class all day) did not 
correlate significantly with either soci 
status score. 

This investigation did indicate that per- 
ceptions regarding EMR children's social 
behavior and academic competence were 
important predictors of the latter's social 
status, although the amount of variance in 
acceptance and rejection explained by the 
perceptions was relatively low. Within 
limitations imposed by a small but statisti- 
cally significant amount of variance €X- 
plained, the present data suggest the fol- 
lowing hypotheses: (a) Perceptions of mis- 
behavior are associated with social rejection 
of EMRs and (b) perceptions of academic 
competence are associated with social ac- 
ceptance of EMRs. Such data suggest that 
future attempts designed to improve the 
social status of EMR pupils will have to 
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consider more carefully than has been done 
in the past (a) whether improved social sta- 
tus implies increased acceptance or de- 
creased rejection and (b) the specific be- 
havior that is associated with either of the 
two preceding social status constructs. To 
the extent that social acceptance and social 
rejection are not two ends of a single con- 
tinuum but represent independent con- 
structs that are influenced by different types 
of behavioral expression, it appears that it 
would be an easier task to reduce EMR 
children’s social rejection than to improve 
their social acceptability. Tf social rejection 
is associated with misbehavior, it may be 
easier to modify this behavior than to teach 
the child the necessary academic skills that 
would result in improved acceptance. 

If, in fact, interpersonal attraction is de- 
pendent on the initial impressions of the 
observers; a logical strategy to improve the 
social status of the EMR children would be 
to develop techniques designed to modify 
their behavior prior to placement in the 
regular classes. Once EMR children are 
integrated and are perceived to manifest 
inappropriate behavior, improving their 
social status may be a very formidable task. 
If preintegration training of EMR children 
is logistically difficult to implement, an al- 
ternative strategy could be to develop edu- 
cational programs within the context of the 
regular class curriculum that are designed to 
modify inappropriate behavior associated 
A number of investi- 


jected. One 
was developed by Ballard, Corman, 
and Kaufman 

The data of this study revealed that 
teachers and peers showed a high degree of 
agreement about the behavior and academic 
abilities of the EMR children in their classes. 
Evidence for teachers’ and peers’ agreement 
regarding EMR children’s academic and 
social behavior is especially strong in the 
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present study, since both sets of raters pro- 
vided their nominations on questionnaires 
that differed both in content and response 
formats. While previous research has in- 
dicated that teachers and peers can both 
identify children who are low in social status 
(Gronlund, 1953), the present investigators 
are not aware of existing data that indicate 
that teachers and peers share similar per- 
ceptions regarding the behavior of children 
that may be responsible for their status in 
the social hierarchy. Beyond the issue of 
shared perceptions of peers and teachers 
toward retarded children is the question’ as 
to whether teachers influence peers’ per- 
ceptions or vice versa. This concern has 
important implications for designing inter- 
ventions to improve the social status of EMR 
children. If teachers are found to influence 
peers’ perceptions of EMR children, as the 
results of one study suggested (Semmel, 
Ballard, & Sivasailam, Note 7), attempts to 
improve retarded children’s social position 
could be directed toward changing teachers’ 
behavior rather than toward modifying the 
nature of the interactions between nonre- 
tarded and EMR children, which is currently 
the predominant approach to the problem 
(e.g., Rucker & Vincenzo, 1970). 

Lindzey and Byrne (1968) have argued 
that conclusions about the relationship be- 
tween personality variables and social choice 
can be obtained only when the personality 
ratings are provided by someone other than 
the individuals making the sociometric rat- 
ings. In the context of the present investi- 
gation, such an argument implies that the 
inclusion of the peers’ ratings of dull and 
disruptive are of little significance, since they 
are used to predict their own ratings of social 
status. In order to determine whether dif- 
ferent patterns among cognition, misbeha- 
vior, and social status would emerge when 

peers' ratings were omitted, separate com- 
monality analyses were performed for social 
acceptance and social rejection, with each 
equation including only teachers’ ratings of 
academic concentration and misbehavior as 
well as number of hours integrated. Results 
of these post hoc analyses revealed a similar 
pattern of findings to those including both 
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teacher and peer ratings. Teacher rai ing 
of cognition accounted for 8.296 of the uni 
variance in social acceptance, while ej 
ratings of misbehavior predicted 6.4% of h 
unique variance in social rejection. On th 
other hand, teacher ratings of cognitiv 
ability accounted for 2.596 of the unique 
variance in rejection, while their ratings 6 
misbehavior accounted for only .3% of thi 
unique variance in acceptance. 

A variety of questions are suggested by th 
present data. First, descriptions of E, [F 
children's cognitive and social behavior wel 
obtained from teachers’ and peers’ verbal 
nominations. The validity of these pel 
ceptions in terms of the retarded children’s 
observable behavior must be established 
To what extent do perceptions of EMR 
children's behavior actually reflect thei 
daily behavior? 

Second, with what frequency must par 
ticular behavior occur in order to affeci 


iT 


sions do not contribute equally to the re: 
tarded child's perceived misbehavior. Fol 
example, sporadic instances of teasing be 
havior by the EMR child may not be viewed 
as particularly serious by non-EMR peers. 
Blatant physical aggression, on the other 
hand, probably need occur only once to 
brand the transgressor as a misbehaving 
child worthy of social rejection. In other 
words, further research must be directed to 
the relationship between the severity of the 
misbehaving act and the frequency with 
which it occurs to the EMR child’s socia 
rejection. A parallel question concerns 
whether the nature of this relationship is 
Similar for retarded and nonretarded chil 


him, there is very little information regard- 
ing the internaliza: 


by the EMR child. 
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Role of Intelligence and Task Difficulty in the 


Affective Learning Styles of Children 
with High and Low Self-concepts 


Gerald J. August 
Pennsylvania State University 
Shenango Valley Campus 


Fifth-grade children prerated both abstract and concrete nouns for likability, 
and paired-associate lists were constructed by pairing nouns (liked with liked 


and disliked with disliked). As predi 


learned their liked noun pairs more efficiently than their disliked pairs, while 
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Purdue University 


cted, the high self-concept children 


e 


the low self-concept children reversed and learned their disliked noun pairs 


more readily. Further analysis revealed 
most pronounced for low-IQ children. 
to low-IQ children in overall learning, sh 


that these self-concept patterns were 
High-IQ children, who were superior 
owed no preference for their affective 


evaluations in learning. Increasing task difficulty (e.g., by increasing word 


abstractness) resulted in a tendency to | 
liked items. 


Previous research has shown that the 
connotative meaning of verbal items is fairly 
well established by the second grade (e.g., Di 
Vesta, 1966; Di Vesta & Walls, 1969) and 
that by the sixth grade, children possess the 
ability to utilize the evaluative connotation 
of words to encode and facilitate the later 
recall of these words (e.g., Cermak, Sagotsky, 
& Moshier, 1972; Kail & Schroll, 1974). The 
precise nature of evaluative encoding in 
children’s memory is not clear in that eva- 
luative connotation may involve both a 
cognitive component (e.g., a set of implicit 
associative responses that share a common 
meaning), and an affective component (e.g., 
an individual's unique and distinctive eva- 
luative assessments). "Traditionally, mea- 
sures of evaluation have been obtained by 
the semantic differential instrument (Os- 
good, Suci, & Tannenbaum, 1957), which by 
means of its normative technique, tends to 
equate the affective component with the 
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earn disliked items more readily than 


cognitive component. For example, John- 
son, Frincke, and Martin ( 1961) reported 
that evaluative positive words, even when 
matched in frequency of usage with evalua- 
tive negative words, evoke richer associative 
networks that provide stronger stimulus 
support for the positive words. Neverthe- 
less, there are emerging points of view that 
find it helpful to keep the two components 
Separate. 

One such view has been advanced by Ry- 
chlak (1977), who has introduced a line of 
theory (i.e., logical learning theory) and re- 
search that emphasizes the role of an affec- 
tive dimension of connotative meaningful- 
ness in verballearning. Unlike the cognitive 
metric of meaningfulness, the present in- 
terpretation employs idiographic ratings of 
evaluation and is oriented toward the per- 
sonal significance that such evaluations have 
for the individual. 

, In the typical experimental procedure, the 
investigator instructs the subjects to prerate 
stimulus items (e.g., words, consonant- 
vowel-consonant trigrams, contextual 
phrases, and designs) for likability. Then, 
while controlling for the possible effects of 
association value and frequency of word 
usage, he demonstrates a differential reac- 
tion that subjects have for such uniquely 
determined affective assessments in subse- 
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AFFECTIVE LEARNING STYLES 


quent learning tasks. This measure of 
meaningfulness is termed reinforcement 
value and implies that when an item is pos- 
itive (i.e. liked), it has a different meaning- 
fulness for the subject than when it is nega- 
tive (i.e., disliked). This assumption has 
been supported by a number of investiga- 
tions that have consistently shown that even 
when subjects rate stimulus items as positive 

„that have obvious negative connotations 
' (e.g., traitor and failure), they learn these 
items faster than items that they rate as 
negative (Abramson, Tasto, & Rychlak, 
1969; Rychlak, 1966). Furthermore, it has 
also been found that abnormal subjects 
(primarily schizophrenics) show no such 
tendency to learn positively assessed items 
faster than negative items and instead re- 
verse and learn their negatively assessed 
items more readily 


(Rychlak, McKee, 
Schneider, & Abramson, 1971). In short, 
what subjects judge to be uniquely positive 
or negative, even when such assessments are 
at variance with normative ratings, still bears 
a predictable relationship to rate of acqui- 
sition. 

One major point of theoretical interest is 
to explain how such meaningfulness origi- 
nates and how the learning proceeds ac- 
cordingly. One of the basic tenets of logical 
learning theory is that human beings from an 
early age begin to organize, pattern, and 
structure various stimuli into what is affec- 
tively more or less meaningful for them. It 
is further theorized that individuals affec- 
tively relate to items in their environment in 
accordance with how they affectively feel 
about themselves. Once normal individ! 
prejudge items in terms o “good versus bad” 
or “liked versus disliked,” they identify 
themselves with items evaluated good or 
likable because they literally view them- 
selves as good or likable. Those items judged 
as positive thus become more meaningful 
and are acquired more readily in the learning 
task. It follows that individuals who per- 
ceive themselves in negative terms Wl 
identify along the negative dimension and 
will manifest a diminution in the learning of 
positive as opposed to negative items an' 
occasionally reverse and learn more effi- 
ciently their negatively assessed items. 

Continuing in this vein, an important as- 
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sessment in any learning situation is the af- 
fective assessment a person makes of his own 
ability to accomplish tasks, meet challenges, 
and so on, all of those things we generally 
include when the self-concept is under con- 
sideration. If the person has a positive 
self-concept, logical learning theory would 
predict that he or she would be more likely 
to extend positive than negative meanings 
inalearning task. Conversely, if the person 
has a negative self-concept, we would expect 
him or her to extend negative meanings as 
readily or even more readily than positive 
meanings in the task. Research to date has 
supported these expectations in that both 
normal adolescents and children with posi- 
tive self-concepts have been found to learn 
verbal materials that are positive in rein- 
forcement value more readily than materials 
that are negative. However, their negative 
self-concept counterparts minimize this 
“positive reinforcement value effect” or re- 
verse it entirely by learning disliked more 
readily than liked verbal materials (August 
& Felker, 1977; August, Rychlak, & Felker, 
1975; Rychlak, Carlsen, & Dunning, 1974). 
While the employment of self-concept as 
a moderator variable in the studies cited has 
yielded contrasting affective learning styles, 
the question is still open as to whether this 
personality construct acts independently or 
whether its apparent influence is a specious 
one caused by its inherent interrelationship 
with other potent variables that also have a 
demonstrable effect on learning (e.g., intel- 
ligence and task difficulty). When working 
with children, it is sometimes difficult or 
impossible to separate IQ level from. self- 
concept level, so that the investigator is not 
sure whether the two subgroups under study 
are high or low in self-concept or whether 
they are simply divided along IQ lines. This 
problem arose in the Rychlak and Saluri 
(1973) study, in which low self-concept 
(fifth- and sixth-grade) children who were 
significantly lower in IQ than the high self- 
concept children failed to reverse the cus- 
tomary positive reinforcement value effect. 
Indeed, the liked-over-disliked advantage 
was even greater for the low self-concept 
children. ! ; 
There is a theoretical justification for this 
finding, since it is postulated by logical 
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learning theory that affective influences are 
operative on learning from a very early age. 
The connotative meaning of stimulus items 
is therefore a very basic feature of human 
learning. It is possible that the lower IQ 
child, as a result of deficiencies in verbal 
learning strategies, will be more sensitive to 
the connotative features of a learnable item 
than he or she will be to its denotative fea- 
tures. This, in turn, could magnify the ef- 
fects of self-concept on affective learning 
style at the low-IQ level and attenuate or 
mask such effects at the high-IQ level. 
While the manifest difficulty of a task may 
reflect the child’s intellectual abilities, it is 
also possible that components intrinsic to 
the task (e.g., orienting instructions, and 
abstractness of the to-be-remembered items) 
may act to increase the level of difficulty and 
hence amplify the sensitivity to affection. 
In this regard, we have cross-validating evi- 
dence that children who-take significantly 
more trials to reach criterion in a verbal 
learning task also exhibit the largest differ- 
ential in affective learning preferences 
(August & Felker, 1977; August et al., 1975). 
Task difficulty relative to a subject's ca- 
pacity is therefore of crucial importance to 
any discussion of affective learning style. . 
In the present study, task difficulty is 
operationally defined in terms of the imag- 
ery-evoking capacity of the to-be-remem- 
bered items. Paivio (1971) has stated that 
concrete and abstract verbal materials differ 
in the extent to which they convey meaning 
that is represented mentally in the form of 
nonverbal spatial imagery. He argues that 
concrete materials (e.g., words like house, 
truck, and game) evoke nonverbal images 
more readily than abstract materials (e.g., 
words like hope, wish, and idea). Research 
has demonstrated that subjects asked to 
learn abstract materials in paired-associate 
tasks have more difficulty reaching criterion 
than subjects asked to learn concrete mate- 
rials, and this holds equally for all levels of 
IQ (e.g., Paivio, 1971). 

The natural question arises: Is it general 
intellectual ability or intrinsic task difficulty 
that accounts for the majority of variance in 
reinforcement value findings across levels of 
IQ? It is also of crucial importance to es- 
tablish whether either of these factors affects 
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the already established learning styles of 
subjects who have varying levels of self- 
concept. Based on what we know to this 
point, the following hypotheses were put to 
testin the present research: (1) High self- 
concept children will learn along the positive 
affective dimension more pronouncedly than 
will low self-concept children. (2) Disre- 
garding the intelligence of the subjects, the 
more difficult a verbal learning task is, the 
more pronounced will the reinforcement 
value effects be in the observed data. (3) 
Disregarding task difficulty, the less intel- 
ligent a subject is, the more pronounced will 
his reinforcement value effects be in the. 
observed data. And (4) most important of 
all the tenets of logical learning theory, nei- 
ther intellectual level nor task difficulty will 
reverse the expected finding on Hypothesis 
1; 


Method 
Subjects 


The original sample included 96 fifth-grade children: 
drawn from two elementary schools in western Penn- 
sylvania. Only those students returning parental 
consent forms were eligible to participate in the various 
aspects of the study. IQ and self-concept scores were. 
obtained for each of these children on tests adminis- 
tered several weeks before the initial experimental task. 
From this original sample, 64 children were selected on 
the basis of whether or not their scores fell into the 
designated levels as stipulated in the experimental de- 
sign. The final sample of children was equally divided 
by sex with a mean age of 10 years and 6 months, 


Materials 


Piers-Harris Self-Concept Scale. All of the children 
mere pretested with the Piers- Harris Self-Concept Scale 
(Piers & Harris, 1964), which is described in previous 
research (August et al., 1975). 

Word Assessment Scale. All of the children were, 
administered the Word Assessment Scale on two sep- 
arate occasions. This scale was specifically designed 
for the purpose of measuring affective assessments for 
lists of common words. Children were instructed to 
illustrate their specific assessments by sketching in the 
mouth on the figure of a human face that adjoined each 
word on the scale. Depending on the assessment, the 
mouth could either be drawn in a smiling position, in- 
dicating a liked assessment, or in a frowning position, 
indicating a disliked assessment, In addition, the 
children were told to mark an “X” across any words that 
they found too difficult to rate or that were unfamiliar 
tothem. The scale contained 200 monosyllabic nouns | 
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- Means and Standard Deviations on Self-concept and IQ for Each Experimental Group 
IQ Self-concept 
Group E E 
rA 
High IQ 
High self-concept — high task difficulty 119.75 6.27 65.00 5.09 
High self-concept — low task difficulty 119.25 6.45 69.87 3.58 
Low self-concept — high task difficulty 121.25 6.56 48.50 4.37 
Low self-concept — low task difficulty 117.50 7.50 50.25 3.50 
Low IQ 
High self-concept — high task difficulty 89.25 7.51 64.12 4.58 
High self-concept — low task difficulty 92.25 5.60 66.50 7.23 
Low self-concept — high task difficulty 93.62 7.38 44.63 5.09 
92.88 8.06 46.15 5.78 


Low self-concept — low task difficulty 


that ranged from four to six letters in length. The 
words were taken from third- and fourth-grade spelling 
texts and were subsequently calibrated for word fre- 
quency count and imagery value according to the Paivio, 
Yuille, and Madigan (1968) word norms. For the pur- 
pose of the present investigation, the experimental 
words were equally divided into a category of concrete 
pem (e.g., truck) and a category of abstract nouns (e.g., 
1dea). 


Design 


Subjects were placed into four groups based on their 
scores on the Lorge- Thorndike Intelligence Test. and 
the Piers-Harris Self-Concept Scale. For both tests, 
selection into either a high or low group was determined 
by utilizing the standardized mean and standard de- 
viation of the tests as reference points. Subjects whose 
scores fell between the range of .5 and 1.5 standard de- 
viations above the means of the respective tests were 
designated as both the high-IQ and high self-concept 
groups, while subjects scoring between the range of .5 
and 1.5 standard deviations below the means com] 
the low-IQ and low self-concept groups. Subjects were 
further assigned at random to either a group receiving 
lists of abstract paired associates (i.e., high task diffi- 
culty) or a group receiving lists of concrete paired as- 
sociates (i.e., low task difficulty). ‘The resultant means 
and standard deviations for the eight experimental 
groups are displayed in Table 1. ‘A comparison of 
means revealed no significant differences between the 
means of the high-IQ or high self-concept groups nor 


sa between the low-IQ and low self-concept groups. 


ia 


Thus, the design included 2 X 2 X 2X 2 (IQ x Self- 
concept X Task Difficulty X Reinforcement Value) 
groups. The first three variables are between-subjects 
factors and the last is a within-subjects factor. Each 
experimental cell contained eight subjects, equally 
represented with boys and girls. 


Procedure 


IQ data were obtained from a test administration 
conducted 1 month prior to the commencement of the 
experiment. 


During the initial experimental testing period, each 
subject participating in the study was administered 
both the Piers-Harris Self-Concept Scale and the Word 
Assessment Scale by means of a group procedure. A 
second administration of the Word Assessment Scale 
was conducted 48 hours later for reliability purposes. 
The ratings on the Word Assessment Scale for each 
subject were tabulated by compiling a list of the 
subject’s liked nouns and a list of the disliked nouns. 
Depending on whether a subject was included in the 
concrete or abstract paired-associates groups, only those 
ratings relevant to his group were actually retained for 
experimental purposes. Moreover, only those nouns 
that were reliably rated across two testing administra- 
tions of the scale were eligible for selection. To maxi- 
mize the idiographic nature of the affective assessments, 
an effort was made to include in the final pool of con- 
crete and abstract nouns only those which approxi- 
mated a like-dislike normative ratio of one to one. 
That is, if a noun was rated by approximately one half 
of the subjects in that specific group as liked and rated 
by the other one half as disliked, the word was retained. 
In this way, it is possible that one subject's liked noun 
may be another subject's disliked noun. 

Paired associates were formed for each subject by 
randomly pairing nouns: liked with liked and disliked 
with disliked. However, when such a procedure pro- 
duced pairs in which the members were semantically or 
phonetically similar, a second selection was made. The 
10 pairs making up each list were further randomized 
and typed in three sequential orders on memory drum 


one acquisition trial 
(presentation of the word pairs) followed by a number 
of recall trials (presentation of the stimulus nouns only). 
This procedure was continued until the subject reached 
the criterion of two trials of perfect anticipations for all 
response members of the 10 pairs. The lists were pre- 
sented on a memory drum with a 2-sec interitem expo- 
sure of the stimulus noun and a 2-sec interitem interval 
for both acquisition and recall trials. There were 4 sec 
allowed between presentation of each of the randomly 
sequenced lists. Tn all cases, the correct response noun 
followed the stimulus noun on the subsequent exposure, 
and this served as an indication of correct or incorrect. 
anticipations for the subject. 
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Results 


The overall effect of evaluative connota- 
tion as measured by the indices of rein- 
forcement value was analyzed by means of 
24 factorial analysis of variance. 


Total List Performance 


To determine baseline learning rates for 
the between-subjects variables, a total per- 
formance score was tabulated by collapsing 
both liked and disliked word pairs together. 
This score was determined by recording the 
total number of trials it took the subject to 
reach criterion for all 10 pairs of the list. 
That is, each noun pair had a separate 
trials-to-criterion score, which was achieved 
when the subject made two consecutive 
correct anticipations for that pair without a 
subsequent error, and these 10 scores were 

added together to provide a measure of 
learning rate. Significant main effects were 
obtained for IQ, F(1, 56) = 14.60, p < .001, 
self-concept, F(1, 56) = 4.18, p <.05, and 
task difficulty, F(1, 56) = 17.44, p < .001. 
"True to expectation, the high-IQ children 
learned paired associates faster (M = 25.56, 
SD = 12.17) than did the low-IQ children (M. 
= 87.42, SD = 15.66). Based on this finding, 
we may conclude that the variable of intel- 
ligence, as measured and employed in the 
present study, was effective in differentiat- 
ing children’s performance in paired-asso- 
ciates learning. In support of previous re- 
search, the high self-concept subjects were 
also more proficient in learning (M = 28.80, 
SD = 13.04) than were their low self-concept 
peers (M = 34.30, SD = 15.01). Finally, 
with regard to the task difficulty variable, all 
children learning concrete paired associates 
manifested a more efficient learning rate (M 
= 25.12, SD = 12.63) than subjects learning 
abstract word pairs (M = 37.97, SD = 15.04). 
Thus, the learning of abstract paired asso- 
ciates was indeed a more difficult task. 
None of the interactions between variables 
reached a level of statistical significance. 


Differential Paired- Associate Learning of 
Affectively Assessed Nouns 


Each subject’s paired-associate perfor- 
mance was also scored for the two within-list 
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conditions of liked versus disliked noum 
pairs. The scoring procedure employed here 
was similar to that described for the total list 
performance, with the one exception that 
separate trials-to-criterion tabulations wer 
obtained for the two within-list conditions, 
Tn this way, each subject had a score summed 
over his five liked pairs and a score summed 
over his five disliked pairs. For example, 
a child's score for his liked pairs equaled 30 
and his score for disliked pairs was 42, thi: 
would indicate that it took him fewer tria 
to reach criterion for the liked pairs as com- 
pared with the disliked pairs and thus would 
indicate a superiority for liked items in rate 
of learning. 
Hypothesis 1 predicts that children wi h 
high self-concepts will learn along the posi 
tive more pronouncedly than children wi th | 
low self-concepts. This was confirmed in 
the Self-concept X Reinforcement Value 


T 


liked nouns (M = 27.56, SD = 12.71) more) 
readily than their disliked nouns (M = 30.08; 
SD = 14.90), whereas the low self-concept 
subjects actually learned their disliked nouns 
(M = 30.81, SD = 13.62) more readily than 
their liked nouns (M = 37.78, SD = 16.17). 

Hypothesis 2 predicts that the more dif- | 
ficult a task is, the more pronounced will the 
reinforcement value effects be in the data 
array. This was confirmed in the Task 
Difficulty x Reinforcement Value interac- 
tion, with F(1, 56) = 


25.09, SD = 12.35) and disliked (M = 25.16, 


abstract nouns, an interesting reversal oc- 
curs, so that the disliked words are actually 
being learned more readily (M — 


17.63 


Hypothesis 3 predicts that when task | 
difficulty is disregarded, the less intelligent 
subjects will have greater reinforcement 
value effects than the more intelligent 
subjects. This was tested in the IQ x Re- 
inforcement Value interaction, which fell 
short of significance, with F(1, 56) = 1.70, 


ns. 
The final hypothesis predicts that the ^ 
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, Table 2 
Means and Standard Deviations for 


Reinforcement Value by IQ and Self-concept 


Reinforcement value 


Positive Negative 
Group M SD M SD 
High IQ 
High self-concept 
Low self-concept 
Low IQ 
High self-concept 
Low self-concept 


2244 
29.88 


8.66 
10.92 


22.56 
27.81 


8.44 
14.17 


32.69 
45.69 


12.95 
18.09 


37.50 
33.81 


14.61 
16.89 


basic patterning of self-concept in affective 
learning style will not be reversed by either 
task difficulty or intelligence. The three- 
factor interaction of Task Difficulty X 
Self-concept X Reinforcement Value was not 
significant, indicating that the predicted 
affective learning styles of high and low 
self-concept children were not significantly 
disrupted by varying the level of task diffi- 
culty. However, the three-factor interaction 
of IQ X Self-concept X Reinforcement Value 
did reach a level of statistical significance, 
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Figure 1. Mean trials to criterion for IQ by self-con- 
- cept on reinforcement value. 
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F(1, 56) = 13.56, p < .001. Table 2 shows 
the mean number of trials to criterion for the 
liked and disliked word pairs broken down 
by sample characteristics of IQ and self- 
concept. The means for this interaction are 
graphically displayed in Figure 1. 

It may be noted from the patterning of 
means depicted in Figure 1 that IQ did not 
reverse the basic findings of self-concept and 
affective learning style. However, by in- 
cluding IQ in the analysis, we do find that 
the self-concept effects are most vivid for the 
low-IQ children. Separate t tests were 
conducted to determine whether there were 
significant differences in the learning of liked 
versus disliked word pairs for any of the ex- 
perimental groups, in accordance with 
Winer's (1971) recommendation for making 
a priori orthogonal comparisons following 
significant interactions. At the high-IQ 
level, both the high and low self-concept 
groups showed no significant preference in 
the learning of liked versus disliked word 
pairs. Atthe low-IQ level, however, the high 
self-concept children manifested a signifi- 
cant preference to learn their liked word 
pairs more readily than their disliked ones, 
t(30) = 2.89, p < .01, whereas the low self- 
concept children reversed this memory 
predilection and showed a tendency to learn 
disliked word pairs faster, t(30) = 9.85, p < 
.001. We therefore conclude, in reference to 
Hypothesis 4, that the variable of IQ did not 
reverse the findings supporting Hypothesis 
1. Nevertheless, IQ did qualify the role of 
self-concept in affective learning style in that 
the low-IQ children clearly manifested the 
strongest sensitivity to their affective as- 
sessments in learning. 


Discussion 


In studying affective behaviors, it is im- 
portant to distinguish between the sheer 
difference observed across levels of like- 
dislike and direction in which such differ- 
ences are manifested. Logical learning 
theory presumes that it is up to the subject 
who does the evaluating to set the direction 
of an affective difference, even though the 
experimenter can help adjust. conditions to 
favor one assessment (e.g., liked) over the 
other (disliked). For example, Rychlak et 
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al. (1974) asked subjects to learn words that 
had been preselected to refer to either au- 
thority relations or close interpersonal 
relations with peers. Subjects were then 
selected who expressed a “hang up” in either 
of these areas and a strength in the other. 
True to prediction, the same subject learned 
along negative lines when memorizing 
hang-up words and along positive lines when 
memorizing words in the area of strength. 
This is taken as evidence that subjects can 
assess an area as positive or negative, effec- 
tively breaking up the globality of their 
self-concept to say “I am a positive behaver 
in this context” (strength) and “I am a neg- 
ative behaver in that context” (hang up). 
These are two different assessments; hence, 
what is extended meaningfully in the learn- 
ing will vary accordingly even though the 
same subject does all of the learning. 
Another manifestation of this tendency to 
extend meaning in learning differentially 
was demonstrated by Marceil (1975), who 
first had subjects observe a subject being 
administered paired associates by memory 
drum and then asked them whether they 
would or would not like to participate in such 
an experiment. This was tantamount to 
asking subjects for a rating of their affection 
concerning the experiment. Subsequently, 
after all subjects had rated verbal materials 
for reinforcement value, both subjects who 
wished to perform in the task and those who 
did not were administered paired associates. 
As predicted, those subjects who assessed the 
task positively by wanting to participate 
learned their liked materials more readily 
than their disliked, and those who assessed 
it negatively by preferring to avoid the ex- 
periment learned along the negative. 
Putting these findings together, the au- 
thors would suggest that the findings on task 
difficulty and the reversal of reinforcement 
value effects are due to the likely possibility 
that subjects who had to learn the abstract 
nouns predicated this task negatively. Itis 
not uncommon for children to equate “dis- 
liked” with “hard” school subjects. Chil- 
dren usually cannot explain why they find a 
subject difficult to master except that they 
“never liked it.” This is often taken by the 
psychologist as a naive analysis, but logical 
learning theory takes such statements lit- 
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erally as reflecting a true reason. The 


subjects of our study did indeed find it more n 


difficult to learn the abstract nouns (see the 
Results section), and in so doing, they re- 
flected a disliking pattern by achieving cri- 
terion on the negative meanings more readily 
than the positive meanings among their word 
pairs. 

Why should we be concerned about posi- 
tive learning? Because from our work on 
the self-concept, there is reason to fear that 
children who orient their learning in the 
negative direction can possibly develop an 
abnormal pattern of behavior. Further- 
more, we hold to the view that “learning how 
not to learn" can occur (see Rychlak, 1977, 
p.328). Much of what is negative" in the 
learning situation concerns misunder- 
standing, getting the wrong predication 
aligned for the task at hand, doing the re- 
verse of what the instructor desired, and so 
on. It is therefore taken as a general as- 
sumption that learning along the positive is 
to be preferred to learning along the nega- 
tive—even though human beings always 
learn both ways in life. That is, a relative 
preponderance of learning along the positive 
would suggest that the individual is assessing 
life positively, looking forward to good, 


5 


helpful, desirable, and pleasant things hap- . 


pening and that these things are in fact 
coming to pass. It is, of course, theoretically 
possible that certain individuals may ac- 
tively repress the negative aspects of life, so 
that in time, they would also develop ad- 
justment problems. 

Even though the data testing Hypothesis 
3 did not prove significant, Figure 1 leaves 
little doubt that, at least for this sample, 
taking intelligence into consideration 
brought out the self-concept patterns most 
vividly. That is, the low-IQ children, who 


incidentally take significantly more trials to » 


reach learning criterion, exhibit a marked 
sensitivity to the evaluative connotation that 
stimulus items have for them. However, the 
direction of such affective learning prefer- 
ences contrast diametrically as a function of 
the children’s self-concepts. High self- 
concept children with low IQs showed the 
predicted positive reinforcement value ef- 
fect, learning more readily along the positive 


dimension of affective meaningfulness; while | 
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the low self-concept children at this IQ level Kail, R. V., & Schroll, J.T. Evaluative and taxonomic 


owed a clear reversal in the ositive rein- encoding in children’s memory. Journal of Exper- 
sh P ‘mental Child Psychology, 1974, 18, 426-437. 


forcement value effect. Ontheotherhand, arceil, J.C. The role of the person in the psychology 
high-IQ children, who were significantly experiment: A reappraisal of the problems of re- 


faster in overall learning performance, did inforcement, response set and volunteer error. 
not show any reliance to their affective as- iar mater’s thesis, Purdue University, 
sessment s of sa MS in learning. Osgood, C. E., Suci, G. J., & Tannenbaum, P5 The 
Surely, if our sample had consisted exclu- measurement of meaning. Urbana: University of 


sively of high-IQ subjects, it would have been Illinois Press, 1957. 
etrimental to the finding of contrasting  Paivio, A. Imagery and learning. In S. J. Segal (Ed.), 


i s Imagery: Current cognitive approaches. New 
affective learning styles for our two levels of ke dante Pres 101]: 


self-concept. Paivio, A, Yuille, J. C., & Madigan, S. A. Concreteness, 
imagery, and meaningfulness values for 925 nouns. 
Journal of Experimental Psychology Monograph, 
1968, 76(1, Pt. 2). 

Abramson, Y., Tasto, D. L, & Rychlak, J. F. No- Piers, E. V., & Harris, D. B. Age and other correlates 
mothetic versus idiographic influences of association of self-concept in children. Journal of Educational 
value and reinforcement value on learning. Journal Psychology, 1964, 55, 81-85. 
of Experimental Research in Personality, 1969, 4, Rychlak, J. F. Reinforcement value: A suggested 
65-71. c idiographic, intensity dimension of meaningfulness 

/ August, G. J., & Felker, D.W. Role of affective mean- for the personality theorist. Journal of Pe Paanhiys 
ingfulness and self-concept in the verbal learning 1966, 34, 311-335. 
styles of white and black children. Journal of Edu- Rychlak, J.F. The psychology of rigorous humanism. 
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Locus of Control in Relation to Academic Attitudes 
and Performance in a Personalized System 
of Instruction Course 
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This study examined locus of control in relation to academic attitudes versus 
study habits, rate of progress, and final achievement based on differential pre- 
dictions derived from social learning theory and attribution theory. Rotter's 
Internal-External Locus of Control Scale (I-E) and the Survey of Study Hab- 
its and Attitudes were administered to 138 undergraduate students in a per- 
sonalized system of instruction course in introductory psychology. The re- 
sults of multiple regression analyses indicated that the I-E scale is related (p 
< .05) only to academic attitudes and that study habits are related to both per- 
formance measures, It was inferred that attribution theory provides the best 


explanation for these results. 


The degree to which perceived locus of 
control can be used to predict academic 
achievement has received considerable at- 
tention in the last few years (for reviews, see 
Lefcourt, 1976; Phares, 1976). A common 
argument for the expected relationship be- 
tween internality and academic achievement 
Stems from the assumption that if a person 

, believes that one's successes and failures are 
due to the result of one's own behavior, the 
person will be more likely to exhibit initia- 
tive and persistence in seeking achievement 
goals (Lefcourt, 1976; Rotter, 1966). The 
individual would thereby acquire more in- 
formation and greater problem-solving skill 
(McGhee & Crandall, 1968). However, the 
cumulative result of these studies is ambig- 
uous. Phares (1976) concluded his review 
by suggesting that internality does tend to 
be related to academic performance. But, 

Lefcourt (1976), in a review published almost. 

simultaneously, concluded that the studies 

“are often riddled with inconsistent and . . . 

‘weird’ results" (p. 71). 

Five studies not included in the above re- 
views do not help to resolve the issue. One 
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(Prociuk & Breen, 1974) found a relationship 
between perceived locus of control and 
achievement. "Three of the studies (Allen 
Giat, & Cherney, 1974; Daniels & Stevens; 
1976; Parent, Forward, Center, & Mohling; 
1975) found an interactive relationship be 
tween perceived locus of control, course 
Structure, and achievement, with externalt 
performing better under highly structured 
conditions. The fifth study (Johnson & 
Croft, 1975) found no differences in 


achievement due to locus of control of 
Structure. 


(Cronbach, 1975; Cronbach & Snow, 1977) 
However, Rotter (1966, 1975) has consis 


ti S 
for locus of control and performance in sit- 


: 


mont 


p^ 


LOCUS OF CONTROL 


in studies of collegiate performance have 


' been in school for many years and would 


tend to have considerable experience in 
similar settings. Consequently, generalized 
expectancies may not be a good predictor in 
typical academic situations (Phares, 1976; 
Rotter, 1975). 

Another source of inconsistency in results 
could be due to a confounding in much of the 
research on the relationship between locus 


- of control and achievement. An alternative 


theory (Weiner, 1974) predicts that locus of 
control will be more directly related to atti- 
tudes than achievement. This theory is 
based on the work of Heider (1958), who 
suggested that there are four perceived de- 
terminants to which an outcome of behavior 
may be attributed: ability, effort, task dif- 
ficulty, and luck. According to Weiner et al. 


; (1971), a confounding occurs in the social 


learning theory research (Rotter, 1954, 1966) 
when these four determinants are collapsed 
into a single dimension of perceived internal 
(ability and effort) versus external (task 
difficulty and luck) determinants. It is 
possible to combine the four elements ina 
different manner that reflects the perceived 
stability of the determinants rather than the 
perceived locus of control (Weiner et al., 
1971). This dimension results from com- 
bining ability with task difficulty (stable 
elements) and effort with luck (unstable el- 
ements). The confounding in the social 
learning research, according to Weiner, has 
resulted from the tendency (e.g., Holden & 
Rotter, 1962; James & Rotter, 1958; Phares, 
1957; Rotter, Liverant, & Crowne, 1961) to 
compare ability-related tasks (a stable, in- 
ternal determinant) to luck-related tasks (an 
unstable, external determinant). Therefore, 
these studies differ on the dimension of 
stability in addition to the dimension of 
locus of control. 

In several studies designed to control for 
this confounding (e.g., Frieze & Weiner, 
1971; McMahan, 1973; Weiner, Heckhausen, 
Meyer, & Cook, 1972), Weiner and his asso- 
ciates have found that performance is relai 
more to the stability dimension than to locus 
of control. They have also found that locus 
of control is related to affective responses. 
In the studies reviewed by Weiner (1974), 
ascriptions of goal attainment to ability or 
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effort (internal determinants) tended to be 
related to affective responses such as pride 
of accomplishment or fear of failure. When 
the ascriptions were to external determi- 
nants, there seemed to be less of an affective 
response. This would suggest that in an 
academic context where achievement may 
be presumed to be primarily a function of 
effort and ability, the internally oriented 
students will have a greater affective re- 
sponse than will the externally oriented 
students. However, this has not been tested, 
since all of the studies reviewed by Weiner 
(1974) were experimental studies in which 
the conditions were manipulated to effect 
subjects’ affective states; none of the studies 
referred specifically to the relationship be- 
tween locus of control and academic per- 
formance versus academic attitudes as in the 
present study. 

Thus, there are conflicting theoretical 
positions with respect to the predicted re- 
lationship between locus of control and ac- 
ademic achievement. The present study 
was designed to study the question of 
whether locus of control would be related 
more to attitudes than performance. In 
keeping with Weiner’s (1974) findings that 
internals will exhibit a greater degree of af- 
fect than will externals in response to an 
achievement situation where goal attain- 
ment tends to be a function of ability or ef- 
fort, it was expected in the present study 
that internality would be related to a positive 
affect toward education. Presumably, the 
affective reactions of internals will be more 
positive following success and more negative 
following failure in a skilled task context. 
Furthermore, the measures of locus of con- 
trol and affect were taken at the beginning 
ofasemester. At that time, it was assumed 
that the subjects’ affect would be influenced 
primarily by their overall success in school. 
Had the measures been taken at the end of 
a semester, they would presumably have 
been influenced more heavily by the imme- 
diate outcomes of that semester. 

The appropriate comparisons were made 
in the present study by using a locus of con- 
trol scale (Rotter, 1966), two behavioral in- 
dicators of performance (rate of progress and 
final achievement), and a self-report mea- 
sure of study habits and attitudes (Brown & 
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Holtzman, 1967). Since a measure of atti- 
tudes rather than affect was used, it is nec- 
essary to consider the relationship between 
attitudes and affect in relation to locus of 
control and performance. Attitudes gen- 
erally bear a stronger relationship to affect 
than to performance, particularly when they 
are measured by self-reports of inner states 
of mind as opposed to being inferred from 
behaviorally defined groups of responses 
(McGuire, 1968). Furthermore, following 
the component structure of attitudes sug- 
gested by Katz and Stotland (1959), many of 
the items used in the attitude scales of the 
Survey of Study Habits and Attitudes ap- 
pear to be statements of positive or negative 
feelings and beliefs rather than commit- 
ments to action. Most of the action-type 
statements are low-inference items indicat- 
ing study habits rather than attitudes. 
Thus, there seemed to be no reason to expect 
astrong relationship between attitudes and 
performance in the present study, but a re- 
lationship between locus of control and at- 
titudes and between study habits and per- 
formance was expected based on attribution 
theory and previous research. 

The present study was conducted with an 
introductory psychology course utilizing the 
Personalized System of Instruction (PSI) 
introduced by Keller (1968). In a PSI 
course, students proceed at their own rate 
and take exams individually as they finish 
each module of the course. Since locus of 
control has been correlated with effective 
time utilization (Gozali, Cleary, Walster, & 
Gozali, 1973), and since success in a PSI-type 
course would tend to be related more to 
ability and effort than to task difficulty or 
luck, it was felt that this would be an ap- 
propriate context for the present study. 

Furthermore, the problem under investiga- 
tion pertains directly to PSI-type courses. 
Johnson and Croft (1975) found no rela- 
tionship between locus of control and three 
indicators of PSI course performance 
(grades, time to completion, and atten- 
dance) However, Daniels and Stevens 
(1976) did find superior performance on the 
part of internals in a self-paced, criterion 
referenced section of a course that was very 
similar to a PSI design. In addition, Allen 
et al. (1974) found that internals contracted 
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for and earned higher grades than externals 
The present study used a comparable! 
treatment and incorporated a rate-of-pro 
gress measure. It was expected that if in 
ternals used their time more effectively o 
had less of a tendency to procrastinate, then 
would be a correlation between internali 
and rate of progress. ý 
In summary, the primary purpose of the 
present study was to determine whethel 
locus of control would be related more to 
academic performance or academic attitude 
based on self-report and behavioral data. 
Specifically, it was expected that locus o 
control, as measured by Rotter’s Interna 
External Locus of Control Scale (I-E scale; 
Rotter, 1966), would have a stronger rela- 
tionship to attitudes than habits as mea- 
sured by the Brown and Holtzman (1967) 
Survey of Study Habits and Attitude 
(SSHA). Furthermore, with respect to the 
behavioral indicators, it was expected that 
locus of control would be related to rate 0 
progress but not to final achievement. And, 
as a check on the validity of the SSHA, it was 
expected that study habits would have a 
stronger relationship to the two behavioral 
indicators of performance than would stud 
attitudes. 


Method 
Subjects 


Locus of Control Measure 


Rotter's (1966) I-E scale contains 29 items of which 
6 are filler items. Itis keyed so that a high score on the 
total scale indicates an external orientation. Reliability 
estimates reported by Rotter (1966) and in more recent 


reports (eg., Lefcourt, 1976) warrant its acceptability 
for use in group study, 


Criterion Measures 


: Study habits and attitudes. The SSHA contains 100 
items that are grouped into four basic scales: Delay 
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Avoidance (DA), Work Methods (WM), Teacher Ap- 
proval (TA), and Education Acceptance (EA). The 
first two scales (DA and WM) are added to form a study 
habits (SH) score. The latter two (TA and EA) are 
added to form a study attitude (SA) score. All four 
scales are added to form a study orientation (SO) score. 
While reliability and validity studies (Brown & Holtz- 
man, 1967) have demonstrated sufficient stability and 
predictive and discriminant validity to justify its use in 
research studies, it is not recommended as a selection 
tool. Scale intercorrelations usually range from .49 to 
70, causing some overlap in meaning among the four 
scales. Studies that provide evidence pertaining to the 
discriminant validity of the four scales generally indi- 
cate that there is a general study skills factor in the 
SSHA (Khan & Roberts, 1975), but that there is justi- 


' fication for the habits as opposed to attitudes scales. 


Rutkowski and Domino (1975) found support for the 
psychological meaning of the scales in relation to the 
California Personality Inventory. In the process of 
studying factors that contribute to college success, 
McCausland and Steward (1974) found that high school 
attendance and grade point average were more strongly 
related to SH than to SA. 

In a more direct study of the criterion validity of the 
SSHA, Goldfried and D'Zurilla (1973) found support 
for the relationship between SH and ratings of effec- 
tiveness when the ratings were by peers and with both 
SH and SA using self-ratings. Their failure to find a 
high degree of discriminant validity between SH and 
SA may be explained in part by the self-rating proce- 
dure used for the criterion variables and in part by their 
use of the second-order scores (SH and SA) in place of 
the four scales. ‘This limitation on the interpretation 
of scores was identified by Khan and Roberts (1975), 
who tested the congruence between a factor structure 
of the SSHA obtained from a sample of subjects and one 
obtained from the a priori classification of items into the 
four scales. The results supported the classification of 
items into the DA, WM, and TA scales but not the EA 
scale. 

In summary, there seems to be clear support for a 
general study skills factor in the SSHA that tends to 
improve predictions of high school and collegiate per- 
formance when combined with ability measures, and 
there is some support for the discriminant validity of 
at least three of the four scales. Because there issome 
ambiguity in this criterion measure, procedures were 
employed in the analysis to control for the bias that 
might result from the intercorrelations of the scales. 

Performance. 'Two performance measures were 
obtained. The first, total points, was as an indi- 
cator of final achievement. The second was slope, or 
rate of progress. In a study with an earlier group of 
students in the same introductory course used in the 
present study, Sutterer and Holloway (1975) identified 
five characteristic rates (slopes). The same five 
groupings are used in the present study, with only minor 
modifications in the descriptions. They are (a) high 
rate: a continuous high rate of performance and com- 
pletion of the course before the end of the semester; (b) 
steady rate: a steady rate of performance with some 
students pausing around spring break; (c) inconsistent. 
moderate rate: similar to steady rate but with more 
frequent and longer pauses in between spurts of effort; 
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(d) late, or delayed-interval, rate: a very long pause or 
series of pauses during the first part of the semester 
followed by a high rate of achievement; and (e) below- 
standard rate: low achievers not finishing the course 
or not earning a passing grade of C or better. This 
variable provided a behavioral measure of study habits 
from which it was inferred that the shallower the slope 
(i.e., the lower the rate of progress), the greater the 
procrastination. 


Procedure 


Both tests were administered at the first class session 
and were counterbalanced in the order of presentation. 
"The PSI course was structured in the customary manner 
(Sutterer & Holloway, 1975). At the end of the se- 
mester, the students" cumulative records were used to 
obtain the total points and slope for each student. 


Data Analysis 


Differences are occasionally found between the sexes 
on locus of control (e.g., Feather, 1968; Parsons & 
Schneider, 1974) and in the personality profiles asso- 
ciated with locus of control (Keller & Pugh, 1976). 
However, since the sex differences, if any, are usually 
small (e.g., Julian & Katz, 1968; Ramanaiah, Ribich, & 
Schmeck, 1975), researchers have not followed a con- 
sistent pattern of either pooling or separately analyzing 
the data for males and females (Phares, 1976). Since 
the focus of the present study was whether locus of 
control related more to performance or attitude, sex 
data were pooled unless results strongly indicated 
otherwise. Consequently, the first step in the data 
analysis was to analyze the correlations between sex and 
other variables in the study. 

The next step was to study the relationship between 
Following Cohen (1968) 
and Kerlinger and Pedhazur (1973), a type of “forward 
and backward” regression analysis was used due to the 


significant contribution to the multiple correlation, but 
that the habit scales either would not contribute at all 


the results in favor of the rival hypothesis, the habit 
scales were entered first, and the residual variance was 
used to test the relationship between attitudes and locus 
of control. 

In the final part of the analysis, locus of control and 
the SSHA scales were studied with respect to their re- 


lationship to the two performance measures. Corre- 


lations and a regression analysis were conducted as a 


means of testing the validity of the presumed relation- 
ships among these variables. 
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No sex differences were found on locus of 
control. Of all the variables used in this 
study, sex was significantly correlated with 
only two (total points and DA), and those 
were both small in magnitude (.15 and .14, 
respectively), Therefore, separate analyses 
were not performed for the two sexes. 

It can be seen from the zero-order corre- 
lation analysis of Table 1 that all of the locus 
of control measures are more highly corre- 
lated with the two attitude scales (TA and 
EA) than with the study habits scales (DA 
and WM). This tends to offer support for 
the expectations of this study, but due to the 
high correlations among the SSHA scales, 
regression analyses were also performed. 

In the regression analyses comparing the 
locus of control score with the SSHA, TA 
was the only variable that made a significant 
contribution to the multiple correlation (see 
Table 2). This was true both when the at- 
titude scales were entered into the equation 
first and when the residuals were used for the 
comparison after entering the habit scales. 

The beta weights provide additional evi- 
dence as to the relative significance of these 
variables. Considering the squared beta 
weights as indicating the ratio of the inde- 
pendent contribution of each predictor of the 
multiple correlation (McNemar, 1969), we 
can see that TA and EA were both consid- 
erably more influential than DA or WM. 
Neither of the two behavioral dependent 
variables (slope and total points) signifi- 
cantly correlated to locus of control. All of 
the SSHA measures were significantly cor- 
related to slope, but the habit scales hada 
higher degree of correlation than did the 


Table 1 
Zero-Order Correlations Between Locus of 


Control and Study Habits and Attitudes 


Survey of 
Study Habits Locus of 
and Attitudes scale control 
Delay Avoidance —.10 
Work Methods —=15* 
Teacher Acceptance —.35* 
Educational Acceptance —.27* 


* p «.05. 
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Table 2 
Regression of Survey of Study Habits and 

Attitudes on Locus of Control with Attitude 
Scales Entered First and with Habit Scales 
Entered First 


Attitude scales entered first 


‘Teacher Acceptance 084  .35 18.96* 
Education Acceptance .010 — .36 78 
Delay Avoidance 003 . .36 AT 
Work Methods .001  .36 13 
7 Habit scales entered first 

Work Methods 001 — .15 3.23 
Delay Avoidance .003  .16 16 
Teacher Acceptance 084  .35 


Education Acceptance .010 — .36 


*p<.05. 


attitude scales. DA, WM, and EA w 
significantly correlated to total points ( 
Table 3). 

In the regression analyses designed to 
the validity of the SSHA and locus of con 
score as predictors of performance, DA 
found to be the only significant predictor o 
slope (R = .39; F to enter = 24.87, p € . 
and of total points (R — 3.5; F to enter 
17.58, p « .05). 


"Table 3 
Zero-Order Correlations of Locus of Control 


and the Survey of Study Habits and Attitudes 
with Slope and Total Points 


Total 
— Sale Slope points. 


Internal-External Locus of Control 


.04 
Delay Avoidance a E. 
Work Methods —23* 17% 
Teacher Acceptance E 8* 12 
Education Acceptance —992* .17* 


* p <05. 
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lated to slope and three of them to total 
points (see Table 3); however, in the regres- 
sion analyses, DA was the only SSHA scale 
to be significantly correlated to slope and 
total points. This confirms the validity of 
DA as the single best predictor of perfor- 
mance and suggests that the two attitude 
scales (TA and EA) have discriminant va- 
lidity once the correlated variance of the 
.scales has been removed. However, the 
predictive validity of the WM scale was not 
confirmed. This would suggest that either 
DA and WM do not have validity as inde- 
pendent scales or the dependent measures 
used in this study were not sensitive to the 
differences. 

Support for the expected relationships of 
locus of control was obtained in that the 
locus of control scale was highly correlated 
with the attitude scales but not with the 
habit scales or the performance measures. 
It was somewhat surprising to find that locus 
of control was not correlated with rate of 
progress, since à relationship between locus 
of control and effective time utilization has 
been found (Gozali et al., 1973). This may 
have resulted from the present design of the 
Keller plan (Sutterer & Holloway, 1975), 
which was not totally self-paced. All stu- 
dents were required to take a midterm and 
final exam on specified dates, thus reducing 
their responsibility for planning and exe- 
cuting their own schedules of work. This 
may have reduced the effect of individual 
. differences in time utilization. 

Another factor that sometimes influences 
the relationships between locus of control 
and performance is anxiety. Allen et al. 
(1974) found that externals expressed more 
state anxiety than internals during oral as- 
sessments. A more general finding 
(Lefcourt, 1976) is that under high-anxious 
conditions, internals will tend to exhibit a 
type of facilitating anxiety while exte! 
will demonstrate more of a debilitating 
anxiety (Lefcourt, 1976). Since no rela- 
tionship between locus of control and per- 
formance was obtained in the present study, 
anxiety was probably not a significant factor 
in the outcome. 

To summarize, the evidence in this study 
suggests that locus of control is related more 
to attitudes than habits or performance inan 
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academic context. This conclusion is con- 
sistent with the experimental studies of an- 
other correlational study (Ramanaiah et al., 
1975) published since the present study was 
begun. In that study, a pattern of zero-order 
correlation coefficients was obtained that 
was similar to those obtained in the present 
study. The correlations between the atti- 
tude scales and the locus of control scales 
were of higher magnitude than those be- 
tween the habit scales and locus of control 
scales. Based on the overall pattern of cor- 
relations, it was concluded that among other 
things, internals had better study habits and 
attitudes than externals and that internals 
demonstrated more overt achievement- 
striving behavior. However, controls over 
the intercorrelations of the SSHA scales 
were not exercised nor were behavioral in- 
dicators used as in the present study. 
Therefore, given a similar pattern of zero- 
order correlation coefficients, the data ob- 
tained by Ramanaiah et al. (1975) seem to 
offer additional support for the relationship 
between locus of control and academic atti- 
tudes. 

In conclusion, the present study provides 
support for an attribution theory explana- 
tion (Weiner, 1974) of the relationship be- 
tween locus of control and academic atti- 
tudes. In apparent contradiction to this 
conclusion, several of the previously re- 
viewed studies found a relationship between 
locus of control and performance. In keep- 
ing witha social learning theory explanation, 
it might be possible to attribute the results 
of those studies to the novelty of the treat- 
ment as opposed to familiarity of treatment 
when the relationship is not found, but there 
are two problems with this. The first is that 
it is difficult to define familiarity, especially 
when contrary results are obtained with 
comparable treatments. Of the four studies 
using a PSI format or a very similar format, 
two found the relationship between locus of 
control and performance (Allen et al., 1974; 


Croft (1975) and the present study did not. 
The second reason 1$ that when the rela- 
tionship is found, the confounding between 
habits and other sources of 
confounding related to the locus of control 


versus stability dimensions) has not always 
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been controlled. In studies that experi- 
mentally or statistically exercise this control, 
the relationship tends not to be found. 
Therefore, a substantial amount of evidence 
at present seems to indicate that locus of 
control is related more to academic attitudes 
than performance. However, there is cer- 
tainly a need for further research addressed 
to this theoretical issue. 
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Effects of Elaborative Prompt Condition and Developmental | 
Level on the Performance of Addition Problems 
by Kindergarten Children 


Ruth V. E. Grunau 


University of British Columbia, Vancouver, Canada 


The use of an elaborative process by young children in the performance of ver- 


bally presented arithmetic addition 


investigated. Kindergarten children at three developmental levels of number 
conservation performed best under a concrete-plus-verbal prompt condition. 


When performance on items where m 


vers and nonconservers performed differently under an imaginal-plus-verbal 
as compared to a verbal-only prompt condition. A static rather than dynamic 
verbally described relation between stimulus sets resulted in more correct re- 
sponses for two of nine Developmental Level X Prompt Condition groups. 
The results suggested that an elaborative process may be used by kindergarten 
children in the solution of addition word problems. 


Research in paired-associate learning 
has established the importance of imaginal 
and verbal factors in noun-pair learning (e.g., 
Holyoak, Hogeterp, & Yuille, 1972; Paivio, 
1969, 1971; Reese, 1970; Rohwer, 1970; 
Rohwer, Lynch, Suzuki, & Levin, 1967; 
Wolff & Levin, 1972). Rohwer (1973) has 
proposed that paired-associate learning is 
facilitated when a common referent, which 
creates a shared meaning for the items, is 
generated. He calls this process elabora- 
tion. Different types of prompts, verbal and 
visual, are viewed as varying along a di- 
mension of likelihood of evoking elabora- 
tion. 

The possible use of elaboration by young 
children in the performance of arithmetic 
operations remains open to investigation. 
The use of imagery has been mentioned as 
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problems of the form m + n = — was 


> n was compared with m < n, conser- 


possibly serving an important function in 
mathematics learning (Skemp, 1971; Suppe 
1967; Syer, 1953). However, other than in 
geometrical problems, very little experi 
mental work has been done relating use of 
imagery to performance on mathematical 
tasks. Hayes (1973) has carried out ex- 
ploratory work with adults concerning the 
use of spatial information stored in images 
in solving elementary mathematical prob- 
lems. Lee (1971) found that presence of 
visually represented cues facilitated acqui: 
sition of a mathematical rule with Grade 
children. 

There are a number of studies concerned: 
with examining various aspects of addition 
word problems, for example, Rosenthal and: 
Resnick (1974), Steffe (1970), and Steffe and 


may provide meaning; in the case of paired- 
associate learning, it may provide shared 
meaning for an abstract number symbol. 
Initially, young children require the presence 
of objects or pictures (maximally explicit 
prompts) to give meaning to numbers. 


EFFECTS OF ELABORATIVE PROMPTS ON ADDITION PERFORMANCE 


little later, children may no longer require a 
concrete referential event in order to un- 
derstand to what units numbers refer (i.e., 
they have acquired the concept of cardinal 
number) but may have difficulty performing 
an operation with number symbols in the 
absence of any referents. At this point, if 
the child can generate his own referential 
event to provide meaning for the number 
symbols, performance of arithmetic opera- 
tions may be facilitated. 

The purpose of the present study is to 
determine whether some of the findings from 
tlie paired-associate studies on elaboration 
with children can be extended to the arith- 
metic operation of addition in the context of 
word problems. Performance on addition 
word problems is compared under three 
elaborative prompt conditions: concrete 
plus verbal, imaginal plus verbal, and verbal 
only A dynamic-described relation, in 
which the stimuli move together either vi- 
sually and verbally or just verbally, was 
compared with a static-described relation, 
in which the stimuli are side by side. 

The age or developmental level at which 
subjects can utilize imagery instructions to 
generate their own referential event has been 
of interest in paired-associate learning. The 
developmental level at which this capability 
can be observed may vary depending on the 
type of task employed. The type of refer- 
ential event needed to give meaning to 
numbers may not be the same as that needed 
for relating noun pairs. It may be possible 
to generate a referential event for addition 
at an age at which it may be difficult to do so 
for a paired-associate task, for example, 
kindergarten. 

_ Inthe present study, the effect of imagery 
instructions in prompting subjects to gen- 
erate referential events is assessed in kin- 


»« dergarten children at each of three devel- 


opmental levels (as determined by perfor- 
mance on a number conservation test de- 
rived from Piaget, 1952). This was done by 
comparing performance under imagining 
instructions (imaginal-plus-verbal prompt 
condition), listening instructions (verbal- 
only condition), and watching instructions 
(concrete-plus-verbal condition). 

The test of number conservation was used 
as a test of the concept of cardinal number. 
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Some form of elaboration generated fol- 
lowing imagery instructions may aid simple 
arithmetic problem solving after the child 
has acquired the concept of cardinal number. 
Presumably, symbolic referents for numbers 
may not be evoked in or generated by the 
child until associations between sets of items 
and appropriate number symbols (i.e., the 
cardinal use of numbers) have been acquired. 
On this basis, it is hypothesized that children 
classified as conservers (i.e. children who 
show evidence that they have acquired the 
concept of the cardinal aspect of numbers) 
should best be able to utilize the imagery 
instructions. The performance of the non- 
conserving and transitional children should 
be facilitated less by imagery instructions. 


Incidental Learning and Imagery 


There is considerable evidence that im- 
agery-evoking activity is associated with 
better recall in incidental learning tasks as 
compared with nonimagery control activity 
with adults (e.g. Bower, 1971; Sheehan, 
1973; Sheehan & Neisser, 1969). In inci- 
dental learning with children, the role of 
imagery has not been studied to the same 
extent; however, there is some evidence that 
incidental imagery-evoking activity does 
facilitate learning (Goldberg, 1974; Yarmey 
& Bowen, 1972). An incidental learning task 
was included in the present study. Fol- 
lowing the addition word problems, each 
subject was asked to recall the nouns used in 
the problems. Another reason for including 
this task was to check whether the subjects 
under the imaginal-plus-verbal prompt 
condition had utilized the imagery instruc- 
tions. If the subjects generated referential 
events under the imaginal-plus-verbal 
prompt condition and not under the ver- 
bal-only condition, then noun recall should 
be better under the former as compared to 
the latter condition. 


Method 
Design 


independent variables were developmental level 


servers, transitionals, an! conse 
condition (concrete plus verbal, imaginal plus ve! 
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Table 1 
Examples of the Addition Word Problems 


Dynamic-described Static-described 


relation relation 
1. 4 pigs walk to 6 pigs 4 pigs lie there, 6 pigs 
here here 
2. 7 rabbits hop to 2 7 rabbits wait there, 2 
rabbits here rabbits here 


and verbal only), described relation (dynamic and 
static), sex, and trials (with repeated measures on trials). 
A Latin square design was used with 18 orders of pre- 
sentation of the addition word problems; the same order 
was used over Trials 1 and 2. 


Subjects 


"There were 108 kindergarten children (54 boys and 
54 girls) from seven elementary schools in the North 
Vancouver school district who participated as subjects 
in the present experiment. The testing was carried out 
in the spring. The schools serve a middle class to upper 
middle class socioeconomic residential area. In order 
to arrive at three developmental levels with equal 
numbers of subjects of each sex at each level, it was 
necessary to administer the conservation pretest to 166 
children, The mean age of the experimental subjects 
was 70.8 months, with a range of 12 months (65.0 to 77.0 
months). 


Materials 


Conservation pretest. Two sets of buttons, six red 
and six yellow, 1.9 cm in diameter, and two sets of seven 
nuts (almonds in the shell), approximately 3.2 cm in 
length were used. 

Experimental test. Three addition problems were 
used in a general warm-up task (1 + 1,2+1,and 2+ 2). 
Examples of the addition word problems used are pre- 
sented in Table 1. The instructions as to the nature of 
the task and the operation to be performed, that is, 
addition, were presented before the test items in the 
form of general and specific warm-up tasks. 

Fourteen pairs of parallel word problems were used. 
One problem of each pair contained a dynamic-de- 
scribed relation, the other a static-described relation. 
Each subject received 12 test items on two trials, with 
the same order on Trial 1 and Trial 2. The positive 
integers from 2 to 8 were used, with the following 
restrictions: For any problem m +n =a,m+n<10 

and m = n. With respect to the sequence of the 12 
experimental items, the following restrictions were 
applied: For any two consecutive problems i and jm; 
# mj, ni # nj, and a; # aj. 

A foot-operated Lafeyette timer (Model Number 
20225 ADW), which measured response latency to .01 
sec, was used. As defined here, latency was the time 
elapsed from the end of the word-problem presentation 
until the subject gave a response. 

Objects. For the concrete-plus-verbal prompt con- 
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dition, three-dimensional colored wooden animals ware 3 
present, ranging in height from 1.6 cm to 3 cm. The 
number of animals representing each integer in each 
arithmetic word problem was glued to a row onto a strip 
of green cardboard. The cardboard strips were 3.9 cm 
wide and ranged in length from 5.7 cm to 39.5 cm de- 
pending on the number of animals in the set. All ani- 
mals in the two sets in a given problem were identical 
(e.g., all the ducks were identical). 


Procedure 

Subjects were tested individually in a room at the 
school that they regularly attended. The conservation 
pretest was administered on a different day (Day 1) ` 
preceding the experimental test (Day 2), typically | 
within a week. For both sessions, the subject sat op- 
posite the experimenter at a table. Following the 
conservation pretest, each subject was classified within 
one of three developmental levels (conservers, transi- 
tionals, and nonconservers). Each subject was then 
assigned randomly to one of six experimental conditions 
formed by three levels of prompt condition (concrete 
plus verbal, imaginal plus verbal, and verbal only) and 
two levels of described relation (dynamic and static). 

Conservation pretest. Four tasks using two types 
of materials and two arrangements were used. In Task 
1, two sets of buttons were presented in rows; in Task 
2, buttons in circles; in Task 3, nuts in rows; and in Task 
4, nuts in circles. The procedure was the same for each 
task, but the wording of the questions was altered ac- 
cording to the materials and arrangements. In Phase 
1, the experimenter placed the objects on the table in 
one-to-one correspondence and then asked the subject 
(e.g., in Tasks 1 and 2), “Are there as many red buttons 
as yellow buttons or are there more of one kind?" 
Phase 2 commenced only after the subject agreed that 
the initial sets in one-to-one correspondence were equal. 
In Phase 2, the experimenter moved each set so that in 
the case of rows, one was lengthened and one was 
shortened. In the case of circles, what had in Phase 1 
been two concentric circles was changed so that the 
circles were side by side, one bigger than the other. The 
subject was then asked to judge the equality of the two 
new sets and then to give an explanation. The proce- 
dure questions and Scoring were adapted from 
Goldschmid and Bentler (1968). 

Scoring procedure for conservation pretest. Each 
subject was classified as follows: A conserver exhibited 
conserving judgments and explanations across the four 
tasks (e.g., "Each row has the same number.” “If you ~ 
put them back the way they were before, they would be 
thesame." “This row is shorter, but the spaces between 
the buttons are smaller.”). A transitional exhibited 
some conserving and some nonconserving judgments 
or explanations; the subject indicated uncertainty about 
judgments or explanations by changing his or her mind 
on some tasks. A nonconserver made nonconserving 
judgments and explanations across the four tasks (e.g. 
no explanation given or the child just described part of 
the procedure). 

Experimental test. The timer was concealed to the 
left of the experimenter. All items were presented 
orally, as were the subject's responses. Corrective 
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* feedback was provided following each response. Fol- 
lowing a correct response, the experimenter said “Very 
good, X was the right answer”; following an incorrect 
response, the experimenter said “You were very close, 
actually it was X." The experimenter recorded the 
subject’s response and the latency for each item. Every 
subject received the same general instructions, which 
incorporated a general warm-up task to induce a set for 
addition. This was followed by instructions specific to 
each of the three prompt conditions including two 
practice items. The experimental items were not pre- 
sented until the subject had both practice items cor- 
rect. 

Concrete-plus-verbal condition. The subject was 
instructed to watch the animals. Before each test item, 


4 the subject was instructed to “Now watch this.”, While 


| 
| 


7 


| 


¥ 


^ 


^J9f^ dition is presented in Table 2 


presenting the problem orally, the experimenter placed 
the two sets of animal object referents on the table about. 
30 cm apart. In the dynamic-described relation con- 
dition, the experimenter moved the sets of animals to- 
gether while presenting the problem. The animals were 
moved into two parallel rows. In the static-deseri 
relation condition, the sets of animals were not 
moved. 

Imaginal-plus-verbal condition. The subject was 
instructed to imagine the animals. Before each test 
item, the subject was instructed “Now imagine this.” 
The rationale for using the word “imagine” rather than 
more specific instructions such as “picture in your head” 
was as follows: ‘The aim was to direct the child toward 
generating a referential event, which would not neces- 
sarily involve a “mental picture” of the animals in each 
problem. 

Verbal-only condition. The subject was instructed 
to listen about the animals. Before each test item, the 
subject was instructed “Now listen to this.” 

Incidental recall of the nouns in the addition word 
problems. Following Trial 2 of the addition word 
problems, the subject was asked to recall the animals 
that had been in the problems. The subject was given 
no prior indication that this test would be included. 
The objects in the concrete-plus-verbal prompt condi- 
tion were no longer in view. 


Results 


Performance on the Addition Word 
Problems 


Although this was a fully crossed design, 
the nature of the experimental questions 
suggested that orthogonal planned com- 
parisons plus simple effects analysis would 
be the most appropriate approach. Effects 
of the experimental factors were examined 
nested within developmental level (following 
Marascuilo & Levin, 1970). 

Number of correct responses. The mean 

. number of correct responses as a function of 
developmental level and experimental con- 
A summary 
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of the results of the planned comparisons for 
each dependent variable is presented in 
Table 3. yı was statistically significant, 
confirming the hypothesis that conservers 
would perform better than the transitionals 
and nonconservers combined. y» was not 
significant; therefore, the hypothesis that 
transitionals would differ from nonconser- 
vers was not confirmed. Y3, V4, and y; were 
significant, confirming the hypothesis that 
within each developmental level, subjects 
would perform best under the concrete- 
plus-verbal prompt condition as compare 
with the imaginal-plus-verbal and verbal- 
only conditions combined. 

It was hypothesized that the less well de- 
veloped the child's number concept, the 
more important the presence of objects 
might be. The w? statistic was computed to 
compare the proportion of variance ex- 
plained by the comparison between perfor- 
mance in the presence as compared to the 
absence of objects at each developmental 
level. For conservers, 6? = .12, indicating 
that .12 of the total variance was accounted 
for by the comparison for conservers. For 
transitionals, à? = .26; and for nonconser- 
vers, à? = .3 These values may be inter- 
preted as indicating that the less well de- 
veloped the children’s concept of cardin: 
number, the larger the difference in perfor- 
mance when objects were present as com- 
pared with absent, thus supporting the n 

d if- 
ference between the imaginal-plus-verbal 
and verbal-only prompt conditions was 1m 
the predicted direction, as tested by Ye, but 
did not reach significance. For ¥7 and y; 
the prediction of no significant difference 
between the imaginal-plus-verbal and ver- 
bal-only prompt conditions for the transi- 
was supported. 

The interaction of developmental level 
and prompt condition was not tested with 
this design: 
examination of 
difference in performance among the de- 
velopmental levels under the concrete- 


lus-verbal prompt condition, with means 


, 10.18 (transitionals), 
of 10.54 (conservers) Se. 


and 10.42 (nonconservers). 
cantly better performance of the conservers 
as compared with the other two develop- 
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Mean Number of Correct Responses and Standard Deviations as a Function of Developmental 


Level and Experimental Condition 


Develop- Concrete plus verbal Imaginal plus verbal Verbal only 
mental Com- Com- i ] Com- l 
level Dynamic Static bined Dynamic Static bined Dynamic Static bined Overall 
Conservers K 
M E 10.75 10.33 10.54 7.83 808 7.96 6.42 6.08 6,25 8.25. 
SD 141 k 2.73 3.72 
Transitionals X 
seca 10.33 9.92 10.13 4.00 4.58 4.29 3.25 8.25 5.75 6.72 
SD 1.51 3.49 3.72 
Nonconservers 
M 10.33 10.50 10.42 2.50 7.42 4.96 4.08 3.42 3.75 6.38 
SD 131 3.79 2.66 


mental levels combined, as found with VA, 
must be located in the imaginal-plus-verbal 
and verbal-only conditions. It appears that 
the difference between the means for the 
conservers and the other two developmental 
levels may be greater under imaginal plus 
verbal, with means of 7.96 and 4.63, respec- 
tively, than under verbal only, with means of 
6.25 and 4.75, respectively, thus supporting 
the hypothesis that conservers may be better 
able to utilize imagery instructions than the 
transitionals or the nonconservers. 

To test the remaining effects, a 3 X 3 x 2 
X 2 X 2 analysis of variance was performed 
on the number of correct responses in the 
addition word problems. The independent 
variables were developmental level, prompt 
condition, described relation, sex, and trials 
(with repeated measures on trials). The 
main effects of developmental level and 
prompt condition (nested within develop- 


Table 3 


mental level) have been accounted for by th 
planned comparisons. Each of the rema t 
ing effects and the permissible interactiol 
were tested at the p < .01 level of signifi 
cance. The main effect of described relatio 
within developmental level and prompf 
condition was significant, F(9, 72) = 2.61 
with means as presented in Table 2. 
other effects were nonsignificant. The eff ec 
of described relation was examined sepa 
rately for each level of the nesting variable 
and was statistically significant for two ou 
of the nine developmental level and prom) 
condition groups. Transitionals under they 
verbal-only prompt condition had signifi 
cantly more correct responses with stati 
described as compared with dynamic-de 
scribed relations, F(1, 72) = 11.63. Non 
conservers under the imaginal-plus-verbal 
prompt condition performed significantly 
better with static-described as compared 


F Ratios for Orthogonal Planned Comparisons for Each Dependent Variable 


Source 


Va = Mc - (Mr + Myc)/2 

Y27 Mr- Myc 

Ya = Mc +v) - [Mca + v) + Mew)]/2 

Va = Mre + v) 7 [Mra + v) + Mmyj]/2 

Vs = Mucc + v) — [Myca + v) + Myciv]/2 
Ve = Mca + v) 7 Mcv) 

¥7 = Mra + v) = Mr) 

Ys = Mncu + v) — Mnevw) 


Note. For F ratios, df =1,72. C= conservers, T = t; 
= imaginal plus verbal, and (V) = verbal only. 
*p < 0l. 


ransitionals, NC = nonconservers, 


No. correct Incidental 
responses Latencies noun recall T 
10.77* «1 <1 
<1 <1 <1 
14.65* <1 3.12 
32.31 <1 6.29* 
45.58* <1 10.18* 
2.71 <1 <1 
1.98 2.85 <1 
1.36 <1 1.40 


(C + V) = concrete plus verbal, (I + V) 


à 
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Table 4 
Mean Number of Nouns 
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Incidentally Recalled and Standard Deviations as.a Function of 


Developmental Level and Experimental Conditions 


Develop- Concrete plus verbal Imaginal plus verbal Verbal only 
mental Com- Com- Com- 
level Dynamic Static bined Dynamic Static bined Dynamic Static bined Overall 

Conservers 

M 5.67 6.00 — 5.83 4.33 517 475 3.83 4.83 433 . 497 

SD 247 1.60 1.83 
Transitionals 

M 5.83 TAT 6.50 4.50 4.83 4.67 4.67 4.67 4.67 5.28 

SD 2.35 1.61 231 
Nonconservers 

M 7.17 6.17 6.67 4.33 3.33 3.83 4.83 483 483 511 

SD 2.15 1.53 2.08 


with dynamic-described relations, F(1, 72) 
=11. There was no statistically significant 
difference in performance under any of the 
three prompt conditions. 

Latencies. The same analyses performed 
on the number of correct responses were also 
performed separately on the latencies, 
showing no statistically significant differ- 
ences. 

Incidental recall of nouns from the ad- 
dition word problems. The eight orthogonal 
planned comparisons were also performed 
on the number of nouns incidentally recalled 
from the addition word problems. The 
mean number of nouns recalled, as a func- 
tion of developmental level and experimen- 
tal conditions, are presented in Table 4. Yı, 
V», and y were not significant. For the 
transitionals and nonconservers, there was 
a significant difference in noun recall fol- 
lowing the concrete-plus-verbal condition as 
compared with imaginal-plus-verbal and 
verbal-only conditions combined, that is, V4 
and y were significant. 


When performance following the imag- 
inal-plus-verbal prompt condition was 
compared with verbal only, Ye, Y7, and ¥s, 
there was no significant difference for any of 
the three developmental levels. The results 
of an analysis of variance indicated the ef- 
fects of described relation and sex and their 
interaction were not statistically signifi- 
cant. 


Additional Analyses 


Item type. The frequency distribution of 
correct responses by items was examined for 
Trial 1 and Trial 2 within each develop- 
mental level and prompt condition. The 
mean frequency collapsed across trials for 
each item is presented in Table 5. It ap- 
peared that under imaginal-plus-verbal and 
verbal-only conditions, the mean frequency 
of correct responses Was lower for some items 
fairly consistently across the three devel- 
opmental levels. What the easier problems 
all had in common was that they were of the 


^. Table 5 I 
Mean Frequency Across Trials 1 and 2 and Developmental Level of Correct Responses byline 
: as a Function of Prompt Condition 
Prompt Item 
condition: 1 z 3 4 5 6 7 8 9 10 1 12 
Concrete plus 
l verbal 112 102. 102 103 107 95 100 107 113 97 108 10.3 
imaginal plus 
verbal 62.) Clu die Oa Dee 62 65 65 50 M. Bs 
Verbal only ee eio a e ot Tie ene 67 42 9$ : 
Note. Maximum mean frequency is 12. 


a 
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Mean Number of Correct Responses, Mean Latencies, and Standard Deviations for Item Types 


1 and 2 as a Function of Developmental Level and Prompt Condition 


i Verbal only 
Developmental Concrete plus verbal Imaginal plus verbal 
level Type 1 Type 2 Type 1 2 Type 1 Type 2 
Conservers 
ae pate 10.42 10.67 8.75 7.17 7.33 5.17 
SD 1.83 1.23 3.08 3.49 4.03 4.02 
uni 6.10 6.39 8.85 8.19 6.07 7.74 
SD 2.31 2.03 3.09 3.26 3.91 3.61 
Transitionals 
"irr tomm 10.33 9.92 4.50 4.08 6.42 5.08 
SD 1.50 1.93 3.80 3.58 3.58 4.21 
Latencies 
M 7.31 7.66 5.75 6.20 7.60 8.44 
SD 2.66 2.45 2.60 2.26 2.00 2.81 
Nonconservers 
Correct responses 
M 10.67 10.17 5.82 4.09 4.83 2.67 
SD 1.23 1.90 4.09 3.87 3.74 2,06 
Latencies 
M 7.16 7.64 7.65 8.11 6.79 8.08 


SD 1.26 1.92 4.10 4.31 2.39 3,38 
SS a EE RAM EE SERERE EET ID. S08 A 


form m > n; the more difficult problems 
were of the form m < n. 

Although item type was not originally in- 
tended to be a factor in this study, it was of 
interest to reanalyze performance on the 
addition word problems with item type as an 
independent variable. There were six items 
of Type 1 (m > n) and six items of Type 2 (m 
< n) on each trial. Performance was col- 
lapsed over trials so that 12 items of each 
type were available for analysis. A3X3x 
2 X 2 X 2 analysis of variance with repeated 
measures on item type was performed on the 
number of correct responses and then was 
performed on latencies for the addition word 
problems. 'The mean number of correct 
responses and the latencies for each item 
type as a function of developmental level and 

prompt condition are presented in Table 6. 
All effects were tested at the p <.01 level of 
significance. 

Number of correct responses, The main 
effect of item type within developmental 
level and prompt condition was statistically 
significant, F(9, 72) = 3.41. The effect of 
item type was examined separately for each 
level of the nesting variables. The devel- 
opmental levels and prompt conditions that 


accounted for the significant main effect of 
item type were as follows: For the conser- 
vers under the verbal-only prompt condition, 
F(1, 72) = 8.28; and for the nonconservers 
under the verbal-only condition, F(1, 72) = | 
8.28. These results indicate that under the 
verbal-only condition, both the conservers 
and nonconservers had significantly more 
correct responses for Type 1 (m > n) than 
Type2(m<n). For transitionals, the dif- 
ference in performance across item type was 
in the same direction as for the other devel- 
opmental levels but did not reach signifi- 
cance. None of the interactions with item 
type was significant. The finding of no sig- 
nificant difference between Item Types 1 
and 2 for conservers and nonconservers 
under the imaginal-plus-verbal condition 
and the finding of a significant difference 
between item types under the verbal-only | 
condition appear to indicate that the imag- 
inal-plus-verbal condition enabled these 
Subjects to perform as well on items of Type 
2 (m <n) as they did on Type 1 (m > n). 
Latencies, The main effect of item type 
was significant, F(9, 72) = 3.65. The 
breakdown of item type across the nesting 
variables of developmental level and prompt 


z 
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condition indicated that the following levels 
accounted for the significant main effect: 
For the conservers under the imaginal- 
plus-verbal condition, F(1, 72) = 7.57; for 
conservers under the verbal-only condition, 
F(1, 72) = 11.73; and for nonconservers 
under the verbal-only condition, F (1, 72) = 
6.96. It appears that for the conservers 
under imaginal-plus-verbal and verbal-only 
prompt conditions and for nonconservers 


' under the verbal-only condition, signifi- 


cantly more time was taken to solve prob- 
lems of Item Type 2 (m < n) as compared to 
Item Type 1 (m > n). 


Discussion and Conclusions 


The purpose of the present study was to 
determine if findings from paired-associate 


' learning on the importance of developmental 


peog 


level in generating a referential event, im- 
agery instructions as an elaborative prompt 
condition, and verbal context could be ap- 
plied to the performance of addition word 
problems. 

It was found that imagery instructions 
facilitated performance for the more difficult 
items but not for overall performance. Itis 
possible that training children to generate 
referential events may be more effective than 
minimal imagery instructions in improving 
performance on addition problems for kin- 
dergarten children. Some studies in 
paired-associate learning have found that 
although 5- to 6-year-old children can utilize 
interacting stimuli provided by the experi- 
menter, it is difficult for them to generate 
their own imaginal referents (Rohwer, 1973). 
Yuille and Catchpole (1973) found that fol- 
lowing imagery training, kindergarten chil- 
dren could generate interactions. The re- 
sults of the present study suggest it may be 


s+- worthwhile to further investigate the use of 


elaboration by young children for arithmetic 
tasks using a training procedure rather than 
only verbal instructions. 

For the verbal context variable, that is, 
described relation, the static-described 
rather than dynamic-described relation re- 
sulted in more correct responses for transi- 
tionals under the verbal-only condition and 
resulted in more correct responses for non- 


t conservers under the imaginal-plus-verbal 
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condition. There was no significant differ- 
ence in performance under dynamic as 
compared with static verbal context for 
conservers. This supports the prediction 
that described relation may be less impor- 
tant for conservers than for the other two 
developmental levels. The superior per- 
formance found under the static-described 
rather than dynamic-described relation 
suggests that the analogy drawn between 
integrated stimuli in paired-associate 
learning and the dynamic verbal context 
provided here for addition problems is not 
viable. Given that the task was addition, the 
ease of counting (Wang, Resnick, & Boozer, 
1971) may have been more important than 
facilitating the transformation of two sets 
into one. 

The incidental noun recall task was in- 
cluded as a possible way of checking whether 
subjects under the imaginal-plus-verbal 
prompt condition had utilized the imagery 
instructions. It was expected that the best 
incidental noun recall would follow the 
concrete-plus-verbal condition, since both 
visual and verbal referents had been present 
during the addition problems. This pre- 
diction was confirmed for transitionals and 
nonconservers. If elaboration was generated 
in the imaginal-plus-verbal condition but not 
in the verbal-only condition, incidental noun 
recall should have been better under the 
former condition. For transitionals and 
nonconservers, there was no significant dif- 
ference in noun recall between the two 
prompt conditions that lacked object ref- 
erents. For conservers, there was no sig- 
nificant difference in noun recall across any 
of the prompt conditions. Since noun re 
for conservers was a8 good for the imaginal- 
plus-verbal and verbal-only conditions as it 
was for the concrete-plus-verbal condition, 
it would be tempting to say that perhaps 
conservers utilized elaboration under both 
conditions where object referents were not 
present. However, this interpretation i$ 
discounted by the fact that there was no 
significant difference across development: 
level for this task. Since the conservers di 
not recall more nouns than the transitionals 
and nonconservers under the imaginal- 
plus-verbal condition or the verbal-only 
condition, one cannot conclude that con- 
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servers used elaboration under those con- 
ditions, and the transitionals and noncon- 
servers did not. 

The reason that conservers showed no 

difference in incidental noun recall across 
the three prompt conditions and also per- 
formed no better than transitionals and 
nonconservers on this task may have been 
because the verbal context was not intrinsic 
to performing the intentional learning task, 
that is, the addition operation. Postman 
(1964) made a distinction in incidental 
learning between tasks in which the inci- 
dental components of the total learning task 
are extrinsic to the intentional components 
as compared to tasks in which the incidental 
components are intrinsic to the intentional 
components. In the present study, the 
verbal context may have been irrelevant to 
performing the addition operation, given a 
certain level of development of the cardinal 
concept of number. This is consistent with 
the finding that described relation was a 
significant variable for transitionals and 
nonconservers but not for conservers. 

In the Yarmey and Bowen (1972) study, 
children aged 8 years to 13 years rated their 
imagery to noun and picture pairs. The in- 
cidental component, noun and picture recall, 
was therefore intrinsic to the intentional 
component of the task. Yarmey and Bowen 
did not examine developmental changes over 
age (they were comparing performance of 
retarded and normal subjects). However, 
there is evidence of a developmental increase 
in use of selective attention (e.g., Siegel & 
Stevenson, 1966). Hale and Taweel (1974) 
found a developmental improvement in the 
flexibility of attention with children aged 5 
years to 8 years. 

There was a developmental trend toward 
differentiation between situations in which 
it is useful to attend to several stimulus 
features and situations in which it is more 
advantageous to attend selectively. It would 
appear that in the present study, the con- 
servers may have directed their attention to 
the intentional component of the learning 
task, that is, the numbers, more than to the 
extrinsic incidental component, that is, the 
nouns. The transitionals and nonconservers 
may have paid more attention to the verbal 
context as compared to the conservers. The 
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finding that the transitionals and noncon- 
servers recalled more nouns following the 
concrete-plus-verbal prompt condition as 
compared with the imaginal-plus-verbal and 
verbal-only conditions supports the finding: 
of Goldberg (1974) with Grade 5 subjects 
that seeing pictures in addition to verba 
materials resulted in better incidental 
learning as compared to verbal materials: 
only. It appears that the results of the in- 
cidental noun recall task were not useful for 
interpreting whether subjects had generated 
referential events under the imaginal-plus- 
verbal prompt condition primarily because 
the incidental stimuli may have been ex- 
trinsic to the intentional learning task. 
Presumably, subjects may generate refer- 
ential events for numbers without utilizing 
the actual nouns provided, that is, without] 
imagining a given number of specified ani: 
mals. 

This would be consistent with Yuille's 
(1974) finding that facilitation of noun recall 
with children in Grades 2, 4, and 6 was a 
great when the verb connective changed on 
each trial as when it remained the sam& 
Yuille interpreted these results as suggesting 
that verb links may affect children’ 
paired-associate learning in the same way 4 
mediation instructions. The effect of the 
verb connective is to indicate to the child 
that an interaction involving the nouns 
possible. The subject may not use the par: 
ticular interaction denoted by the verb. A 
similar situation may exist for addition 
especially for the conservers. The subject 
may not use the event depicted in the prob 
lem; however, the word-problem context 
may indicate to the child that elaboration of 
some kind can be generated for numbers 
The transitionals and nonconservers, on the 
other hand, may use the event provided. 


Item Type 


_ The introduction of item type as a factor 
in the study was important in that evidence. 
was provided that the effect of prompt con- 
dition interacted with the item type in- 
volved, that is, some aspects of the addition 
problems, possibly the position of the large? 
integer. The finding that the position of the 
larger integer, that is, m > n or m < n, may} 
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- be important in performance of addition 
problems is only tentative. Other aspects of 
the items were not controlled, since the item 
type factor had not been part of the planned 
design. For example, the size of m +n and 
the difference between m and n were not 
controlled across item type. The items 3 + 
4,4 +6, and 4 + 5 may be difficult because 
m and n are almost of the same magnitude 
as well as being of the type m <n. However, 
Groen and Parkman (1972) found that ties, 
for example, 3 + 3 and 4 + 4, were particu- 
larly easy for Grade 1 children as compared 
to the other addition problems, where m + 
n<9. However, ties may bea special case. 
A study in which the Type 1 and Type 2 
items have the same integers and only the 
position of the larger integer changed across 
item type would be needed in order to de- 
termine whether position of the larger inte- 


sks. Further research is needed to clarify 
he role of verbal context for addition prob- 
lems. The effects of imagery training and 
use of elaborative prompts for different 
es of arithmetic items also warrant fur- 
er examination. 
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A simple-structure factor analysis of 
grade children in Norway was carried 


are discussed. 


‘The theory of fluid and crystallized in- 
telligence (Cattell, 1963, 1971; Horn, 1966, 
968, 1970, 1975) is concerned with con- 
«omitant variation among so-called primary 
actors, hypothesizing two major kinds of 
tributes, that of Fluid Intelligence (Gf) and 
stallized Intelligence (Gc). 
actor analytic evidence in support of the 
vattell-Horn theory of Fluid and Crystal- 
zed Intelligence has primarily been brought 
d in studies of adult performance. 
' studies have used prison inmates as 
(Horn, 1972a; Horn & Bramble; 
l; Horn & Cattell, 1966). A study by 
card and Horn (1972) used subjects ob- 
ined from business, personal contacts, 
welfare agencies, and universities; while 
Rossman and Horn (1972) obtained data 
rom a sample of engineering and art stu- 
ents with various years of college. 
The above studies leave room for several 
dre different interpretations of the two 
$ road factors, partly because of the large 
Variability in terms of age and accultura- 
tional influences in the adult samples. In 
| particular, there is a need for studies where 
variables such as length of schooling and age 
level are controlled and varied. 
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test data from a sample of 149 sixth- 
out. Broad factors were interpreted to 


represent Visualization, Speediness, and Fluency, as well as Fluid and Crystal- 
lized Intelligence. "The results are discussed in relation to the Cattell-Horn 
theory of intelligence. Relating the findings to an earlier study of Norwegian 
10- to 11-year-old fourth graders, some of the complexities and inconsistencies 
that characterize the evidence for and against the differentiation hypothesis 


There is some uncertainty as to the im- 
plication of the theory for the ability struc- 
ture in children. While Horn has suggested 
that the two forms of intelligence may be 
indistinguishable, or at least difficult to 
separate, in a fairly homogeneous sample of 
young children (Horn, 1970, 1972b), Cattell 
seems to think that the two factors would be 
clearly separable at an early age, although 
showing progressive differentiation (Cattell, 
1971). Cattell has produced three studies 
specifically designed to demonstrate the 
Gf- Gc distinction in children (Cattell, 1963, 


1967a, 1967). 


The last-mentioned studies are deficient 
in several respects, however. The factors 
identified as Gf and Gc are too narrow to 
represent the broad, pervasive ability factors 
posited in the Gf-Ge theory. Furthermore, 
the two factors are not distinguished from 
other broad factors of intelligence, such as 
broad Visualization (Gv) and Speediness 
(Gs; see Humphreys, 1967; Undheim, 
1976). 

A study by Crawford and Nirmal (1976) 
also obtained two factors interpreted to 
represent Gf and Gc in the performances of 
ninth-grade students. The Gf factor was 
ly defined, however, in that all 


rather narrowl 
variables hypothesized to load the factor 


represented figural reasoning. Also, no 
variables representing broad Visualization 
(Gv) and Speediness (Gs) were included. 
The two factors were distinguished from 
several other broad factors, however. In 
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particular, the distinction between Ge and 
a broad academic achievement factor would 
lend support to the Cattell-Horn formula- 
tion. Ina recent study of 10- to 11-year-old 
children in the fourth grade of a Norwegian 
public school (Undheim, 1976), two broad 
factors were tentatively interpreted to rep- 
resent Gf and Gc, distinct from Gv and Gs. 
All four factors correlate substantially in 
oblique simple-structure solutions. 

There is some evidence, then, that broad 
factors of intelligence defined by much the 
same variables may be distinguished in the 
performance of adults and school-age chil- 
dren. If the findings of Undheim (1976) and 
Crawford and Nirmal (1976) are replicated 
and extended, this would make it less likely 
that the Gc factor is solely a “length of 
schooling” factor, a spurious factor of si- 
multaneous age maturation, or an academic 
achievement factor reflecting formal 
knowledge in a very narrow sense. The 
present study, then, an extension of the 
previous Undheim (1976) study, is con- 
cerned with the definition of Gf and Gc as 
distinct from other broad factors of intelli- 

gence and with the development of these 
distinctions during school age. 


Age Differentiation Hypothesis 


In previous adult studies, Gf and Gc have 
correlated only .16 in one study (Horn & 
Cattell, 1966) but approximately .5 in two 
other studies (Horn & Bramble, 1967; Horn, 
1972a), all three of which were studies on 
samples of prison inmates. Correlations of. 
.4 to .5 were also found in the Shucard and 
Horn (1972) study and by Rossman and 
Horn (1972). In the Undheim (1976) study 
of 10 to 11 year olds, Gf and Gc correlated .64 
in a blind visual solution. 

This finding would fit in with Cattell's 
notion of a progressive age-related differ- 
entiation of Gf and Gc (Cattell, 1971). Re- 
cent reviews of the literature (Anastasi, 1970; 
Reinert, 1970) indicate that research on the 
differentiation hypothesis has resulted in 

more questions than answers, methodolog- 
ical deficiencies and disparate methodolog- 
ical perspectives probably being the main 
reasons for this ambiguous state of affairs. 
In any case, the structural changes are 
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probably quite complex. As pointed outh 
Anastasi (1967), one should expect “som 
functions to become more differentiated 
others less so, with time, depending upon th 
nature of intervening experiences in a par 
ticular cultural context" (p. 303). 
If differentiation is related to the levelo 
performance (Ferguson, 1954; Reinert 
Baltes, & Schmidt, 1965), schooling may bt 
expected to accelerate differentiation im 
some areas of intellectual functionin 
Thus, the specialization of subjects tha 
commonly is introduced in the latter part ol 
elementary school may reduce the varianci 
connected with a broad intelligence facto! 
relative to several primary factors such a 
Verbal Comprehension (V) and General 
Reasoning (Rs). On the level of broad o 
second-order factors, a Verbal-Educational 
or Crystallized Intelligence factor may sep: 
arate from a broad Reasoning factor or Fluid 
Intelligence factor some time during early 
childhood because of differential reinforcing 
histories regarding the importance and sa 
isfaction of achieving in “book learnin 
According to Cattell (1971), the two factor 
should progressively differentiate durim 
School years. 
On the other hand, school may for som 
time have a homogenizing impact on 
aspects of intellectual achievement, moc 
fying a possible increasing differentiatio 
related to maturation, ability, performant 
and so on (Undheim, 1976). One mig 
predict, then, that in a fairly homogeneou 
population, a few years of schooling should 
produce a tightly knit Verbal-Educationd 
or Crystallized Intelligence factor. Also, th 
Fluid and Crystallized Intelligence factors 
if distinguishable at all, should be hight 
correlated (Undheim, 1976). These pre 
dictions received some tentative support I 
a study of 10 to 11 year olds (Undheim 
1976). 
In high school, however, and perhaps 4 
early as in the latter part of elementa 
school, one should expect a greater influence 
of differential parental and peer values 0! 
general school motivation and interest; 1o 
gether with the specialization of subj 
this should halt and eventually reverse th! 
hypothesized integrative process in ce! 
areas. One should expect the variam 
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connected with Gc to be somewhat reduced 
relative to primaries such as Verbal Com- 
prehension (V) and General Reasoning (Rs) 
and a differentiation of Gf and Gc. In 12- 
to 13-year-old Norwegian youngsters, how- 
ever, the homogenizing impact is still ex- 
pected to be strong. The prediction is that 
the present study will show little or no 


structural changes in respect to the Gc and 


Gf factors.! 
Regarding the broad Visualization factor 


(Gv), one might expect that the sensory- ` 


motor experiences of early childhood would 
have formed the factor as a separate entity 
quite early. Also, traditional schooling 
would seem to offer little or no teaching that 
should influence the normal age-related ac- 
cumulation of visual-spatial experience. 
‘One should therefore expect only slight 
structural changes during school years. 

The nature of the broad Speediness factor 
(Gs)—the ability to handle material of rel- 
atively little relational complexity under 
speeded conditions—is not clear. It has 
been suggested that it stems from a test- 
taking effortfulness or from a more physio- 
logically based capacity (Horn & Cattell, 
1966). School work traditionally offers quite 
a bit of speediness training, stressing also the 
distinction between power and speed con- 
ditions, informally teaching different strat- 
egies in response to such conditions. There 
are thus reasons to expect a progressive 
separation of Gs from the other broad factors 
during school years, in particular from Ge, 
the other broad factor closely related to 
schooling. 

Many features of design influence the 
factorial solutions: the sampling of vari- 
ables, the reliabilities of variables, the sam- 
pling of subjects, the rotational methods 
used, and so on. It would thus seem im- 
portant to study the age-related develop- 
ment of broad factors of intelligence in 
samples of subjects that are comparable in 
terms of geographical and sociocultural 
background, using a representative set of 
variables and similar factor analytic meth- 
ods. The present study should be closely 
comparable to the Undheim (1976) study on 
10- to 11-year-old children or most features, 
although the two studies employ somewhat 
different sets of variables. 
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Subjects 


The subjects were 149 sixth-grade children at two 
public schools in Trondheim, Norway. The 92 girls and 
57 boys had a mean age of 12 years 10 months, ranging 
from 12 to about 14 years; 85% of the subjects were be- 
tween 12 years 4 months and 13 years 4 months. The 
self-selection involved in volunteering for 6 hours of 
testing time did not result in a biased sample in terms 
of parent occupation relative to the students asked to 
participate, but girls were somewhat overrepresented 
in thesample. The public primary schools of Norway 
include more than 96% of the registered population of 
12- to 13-year-old children (Central Bureau of Statistics 
of Norway, 1974a, 1974b). 


Measures and Procedures 


Thirty tests were „administered, most of which were 
modeled after brief descriptions given in Thurstone 
(1938) and in Guilford and Hoepfner (1971), and were 
constructed to be appropriate for the particular age and 
language group. Two tests, Series and Matrices, were 
from the Culture Fair Intelligence Test, Scale 2 (Cattell 
& Cattell, 1960). The measures (see Table 1) were se- 
lected to represent several primary factors, as measured 
in adults and adolescents, and to allow for a possible 
identification of the broad factors of Fluid and Crys- 
tallized Intelligence, Visualization, Speediness, and 
Fluency. The tests were administered over three ses- 
sions to groups of 10 to 15 subjects. Data were coll 
during May 1974. 


Results 


Each item was scored for correctness. 
Table 1 presents the psychometrics of the 
tests used. For every power test (i.e., tests 
where every examinee had attempted every 
item or nearly so), the first principal com- 
ponent was computed as a basis for in- 
specting the Kuder-Richardson coefficient 
(KR-20) of sets of items. In 10 out of 20 
power scales, some items were eliminated in 
which the KR-20, through some capitaliza- 
tion on chance, will be a slight overestima- 
tion of the internal consistency of the scale. 
The KR-20 of the original scale is given for 
comparison. Intercorrelations between the 
30 test variables were obtained by the 
product-moment formula. nj 

The correlations were almost all positive, 


1 Norwegian students of that age are in the last grade 
of elementary school. 
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Table 1 ys 
Psychometric Characteristics of the Tests Used 
Total 
minutes eli 
Test used Description No.items worked M c Skew bilit 
Figural Relations 
1. Matrices Culture Fair Intelligence Test, Scale 2, Forms 24 6 155 44 -8 


(Gf) A and B^ 
2. Series Culture Fair Intelligence Test, Scale 2, Forms 16(24 6 18.7 22 -21 
(Gf) A and B^ 


3. Figure Select one of five figures to complete an 16(26) 6% 121 39 
Analogies analogy. 
(Gf) 
Inductive Reasoning (I) 
4. Circle Discover the principle by which one small 17 6 99 46 -6 
Reasoning circle is blackened in each of four rows of 
(Gf) circles and dashes, then apply the rule to 
the fifth row. 
5. Number Discover the rule for a series of numbers and 21(28 6 129 50 —5 TBB] 
Series (Gf) — indicate the next number in the series. (87) 
General Reasoning (R) a 
6. Necessary Supply the needed fact that is missing in a 16 9 91 34 -4 ] 
Facts statement of an arithmetic problem. 
(Gf/Gc) 
7. Arithmetic An arithmetic reasoning test in which the 17(20) 9 97 4.2 
Reasoning computations are absurdly simple. 
(Gf/Ge) 
Formal Reasoning (Rs) 
8. Sentence Select a sentence that is most probably true, 14 6 86 30 —1 
Selection using only the information in a given 
(Ge) statement, 
Verbal Comprehension (V) 
9. Seine A multiple-choice vocabulary test 23 (36) 6 148 44 
iJ 
10. Antonyms A multiple-choice test of antonyms to given 26 (34) 5 18.9 5.0 
(Ge) words 
11. Verbal Assign words to one of two classes (each class 14(18) 8 74 28 Q 
Classifi- is represented by four words). ae 
cation 
(Ge) 
Spatial Orientation (S) and/or Visualization (Vz) 

12. Card Indicate whether or not each of eight figures 168 3074 
Rotation is identical to a standard Phar d ? 186 ae 
(Gv) rotated in a plane. 

13. Paper Indicate what pieces may be put together to 29 —6 bls 
Form make a certain figure. rn $ patie? i 
Board 
(Gv) 

14. Block Count the number of blocks in a pictured pile. — 30 3% 193 44 —7 3 
Counting a 
(Gv) 

15. Punched A multiple-choice test with five alternatives 24 3 138 54 -—3 65 
Holes in which subjects imagine the folding and 


(Gv) unfolding of pieces of paper. 
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Total 
minutes Relia- 
Test used Description No. items worked M' o Skew bility* 


Spatial Orientation (S) and/or Visualization (Vz) (continued) 


16. Surface Indicate among four figures the one that can 18(20) 6 10.8 42 -2 84 
Develop- be made by folding a piece of paper along (.83) 
ment , dotted lines. 

(Gv) 
Speed of Closure (Cs) 

17. Street Identify pictured objects having missing 18 (20) 6 94 39 4A .82 
Gestalt segments. (.81) 
Comple- 
tion 
(Gv) 

18, Mutilated Identify words in which parts of letters are 20 (32) 5 105 3.3 Homan 
Words missing. (.64) 
(Gy) 

Perceptual Speed (P) 

19. Letter Identify words that contain the letter a in a 150 3 55.7 14.0 6 54 
Identifi- list of words. 
cation 
(Gs) 

20. Symbol Judge whether pairs of proper names, letters, 80 5 569 11.8 1 .66, 
CNN or numbers are same or different. 

8) 

21. Identical ^ Find one of five figures that is exactly like the 60 4% 30.3 68 -4 48 
Forms key figure. 

Leo e 

Number (N) 

22. Number Adding numbers 48 2%, - 328 57 0 — 
Additions 
(Gs) 

23. Number Multiplying numbers 28 2% 153 28 —3 — 
Multiplic- ^ 
ations 

y — 0. 1 M MN E 

Motor Speed 

24. Marking Make as many "11" signs as possible. 600 4 20 273 -4 83 

Speed 
Word Fluency (Fw) sí 
P t £ -1 S 

25. Word Write down words starting with a specified 2 3u 957 77 

oa letter (letters s and ¢ used). 
T, 
4 83 6 n 

26. Word Write down three/four-letter words. 2 394 
Taming 

Gr) 
i352: 558.718 
27. Anagram Select letters from sets of six letters to make 2 5 13. 
Fluency familiar words. 


e) (table continued) 
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Table 1 (continued) h 


Total 
minutes Relig. 
Test used Description No. items worked M c — Skew bility 
Ideational Fluency (Fi) 
28, Ideational List members of a broadly defined class J 4 6 42.3 9.5 6 == 
Fluency (animals, countries, flowers, and Christian 
(Gr) names). ' -. 
29. Uses (Gr)° List different uses for which common subjects 4 10 51 28 TM 


can be employed (e.g., book, red brick, car 


tire, and pen). 
30. Conse- List effects of a new and unusual event (all 3 9 45 29 T E 
quences relevant responses counted). 


(Gr)* 


TA , p ; 4 à A 1 as thi 
Note. "The analysis is based on right scores. When the item analysis led to a deletion of items, number of items as wel 
reliability of the Original scale are listed in parentheses. Gf = Fluid Intelligence, Gc = Crystallized Intelligence, Gv = broad Vi- 


sualization, Gs = broad Speediness, and Gr = broad Fluency. 


^ Reliability estimates are Kuder-Richardson coefficients (KR-20) or based on part scores (indicated by an “s” subscript). 


b From Cattell and Cattell, 1960. 


* Score is the number of different uses or effects, common or remote. 


ranging from —.03 to .73.? About 6% of the 
correlations were within the 95% confidence 
interval of a true zero correlation. This may 
be compared with the 24% in the same in- 
terval as found by Guilford (1964) and 
Guilford and Hoepfner (1971), who admin- 
istered similar group measures to adults (and 
some adolescents). In the Undheim (1976) 
study of fourth graders, the percentage of 
zero correlations was about 15 among the 24 
scales in the initial factor analysis, but two 
memory tests accounted for about two thirds 
of the correlations in that interval. 


Data Analysis 


A total of 29 tests were hypothesized to 
measure five broad ability factors. The 
Marking Speed Test was included to delimit 
a possible Speediness factor from simple 
motor speed. Several number-of-factor 
criteria were examined for the 30 by 30 cor- 
relation matrix. These seemed to indicate 
that five or six factors might be retained, 
with the Kaiser-Dickman-Guttman root- 
one criterion indicating five factors? On the 
basis of these criteria and the hypothesis as 
outlined above, it was decided to extract and 
rotate five factors. The five factors were 
extracted using a principal factor procedure 
and with iterations until communalities 
converged. 


Orthogonal quartimax and varimax rota: 
tions were obtained as well as oblique solu 
tions using direct oblimin and promax base 
on the orthogonal solutions (see Harman 
1968; Hendrickson & White, 1964). Also, 
blind visual rotations to oblique simpl 
structure were carried out, starting with thi 
principal factor matrix. 


Interpretation 


The first factor in the quartimax solution, 
as expected, was a very broad factor, repre- 


were reflected in the corresponding oblique 
promax solutions. 


3 The roots of the first 15 principal components we! 


1.62, 1.42, 1.26, .94, .83, .79, T1 
«73, .70, .63, .59, .58, and .52. 
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| Oblique Reference Vector Solution: Broad Factors 


Oblique reference vector structure 


1. Matrices (Gf) 
2. Series (Gf) 
3. Figure Analogies (Gf) 
4, Circle Reasoning (Gf) 
5. Number Series (Gf) 
6. Necessary Facts (Gc and Gf) 
7, Arithmetic Reasoning (Gc and Gf) 
8. Sentence Selection (Gc and Gf) 
9. Synonyms (Gc) 
10. Antonyms (Gc) 
' 11. Verbal Classification (Gc) 
12. Card Rotation (Gv) 
13. Paper Form Board (Gv) 
14. Block Counting (Gv) 
15. Punched Holes (Gv) 
16. Surface Development (Gv) 
17. Street Gestalt Completion (Gv) 
18. Mutilated Words (Gv) 
19. Letter Identification (Gs) 
20. Symbol Identities (Gs) 
21. Identical Forms (Gs) 
22. Number Additions (Gs) 
23. Number Multiplications (Gs) 
24. Marking Speed 
25. Word Fluency (Gr) 
26. Word Listing (Gr) 
27. Anagram Fluency (Gr) 
28, Ideational Fluency (Gr) 
29. Uses 


Gf Gs Gv Gr Gc 
48 09 03 —03 —09 
33 —06 06 12 00. 
43 01 -05 00 08 
34 abl 01 -01 17 
28 16 07 03 12 
19 02 09 -04 40 
45 —05 11 -10 18 
1 -04 -05 04 53 
-02 25 —03 —01 50 
09 08 —08 05 51 
03 17 -01 26 15 
-13 23 44 -15 22 
13 09 50 —03 —09 
06 -07 55 00. -03 
14 07 48 06 -10 
22 —06 39 03 -05 
-09 -15 38 12 -01 
—04 03 34 15 -02 
05 54 01 -09 03 
-04 64 03 —01 06 
—09 22 33 09 01 
21 34 06. 16 =y 
28 35 —10 11 -14 
—06 21 —05 21 08 
-07 —02 04 56 03 
04 06 02 51 -15 
03 01 —08 44 08 
—-02 -01 04 42 09 
—02 -03 -08 94 21 


08 -14 -03 32 37 


30. Consequences E 


Note, This is a blind visual solution based on principal factor solution. Decimal points have been omitted. Gf = Fluid Intelligence, 
Gc = Crystallized Intelligence, Gv = broad Visualization, Gs = broad Speediness, and Gr = broad Fluency. 


direct oblimin (with delta = 0), and promax 
on varimax (with power = 5 or k = 4) were 
the best solutions and almost identical in 
terms of simple structure with about 73%- 
74% of the loadings within .20 of the hyper- 
plane and 68%-70% within .15 of the hyper- 
plane of the reference vector structure. 
Interpretations will most directly be based 
on the blind visual solution. The reference 
vector structure and the factor correlations 
are presented in Tables 2 and 3. Loadings 
above .3 will be considered significant. 
Fluid and Crystallized Intelligence. Two 
broad factors have loading patterns very 
much in accordance with the hypothesis of 
factors representing Fluid and Crystallized 
Intelligence (Gf and Gc). A more detailed 
examination of some aspects of the solution 
may be appropriate. In adult studies, tests 
of the General Reasoning factor, usually 


arithmetic-type reasoning with little em- 
phasis on number calculations, have divid 

their variance between Gc and Gf rather 
evenly or with a major part of their variance 
on one or the other. In 10 to 11 year olds, a 
combined score of arithmetic reasoning 
performances went on the Ge factor (Un- 
dheim, 1976). In the present study, Neces- 
sary Facts loaded the factor interpreted as 
Ge, and ‘Arithmetical Reasoning loaded the 
Gf factor. Tests of Deduction, also called 
Formal Reasoning or Logical Evaluation and 
involving premises and inferences, have 
loaded Gf or Gc or both in the adult studies. 
Tn the study of 10 to 11 year olds, Sentence 
Selection, an informal test of logical infer- 
ences, had significant loadings on the factor 
interpreted as Gf. One hypothesized, how- 
ever, that as school places more emphasis on 
this kind of reasoning and to some extent 
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Table 3 
Intercorrelations Among Intelligence 
Attributes 


Attribute 11-2300 04.55 


1. Fluid Intelligence — 
2. Broad Speediness 


42 — 
3. Broad Visualization 56 24 — 
4. Broad Fluency 69 52 54 — 
5. Crystallized 64 27 62 56 — 


Intelligence 


Note. This is a blind visual solution based on principal factor 
solution. Decimal points have been omitted. 


“teaches” skills of handling similar infor- 
mation, the task should be subject to alter- 
native mechanisms (Undheim, 1976; see 
Horn, 1966). In the present study, then, 
Sentence Selection showed a strong rela- 
tionship to Gc. The loading from a fluency 
measure, Consequences, on Gc is consistent 
with the results of Horn and Cattell 
(1966). 

Broad Speediness. Two of the Percep- 
tual Speed tests and the two number tests 
went together on the factor as expected. 
Identical Forms, involving the matching of 
figures, did not load the factor as expected 
but went instead on the factor to be inter- 
preted as broad Visualization. The insig- 
nificant loading from the Marking Speed 
Test indicates that the factor is not a simple 
executive motor speed factor but rather 
seems to reflect time needed to make dis- 
criminations and to initiate, guide, and 
monitor movements accordingly. The fac- 
tor corresponds very well to the Speediness 
factor found in 10- to 11-year-old children 
and is similar to a factor in the study by 
Horn and Cattell (1966), but the latter 
Speediness factor also involved quickness in 
simple repetitive writing, indicating some 
involvement of motor speed. 

Broad Visualization. This factor corre- 
sponds very well to the broad Visualization 
factor found in a study of 10 to 11 year olds 
and in adults. The tasks that define the 
factor involve visualization in some sense 
under speeded or more powerlike condi- 
tions. 

Broad Fluency. The factor has loadings 
from the three Word Fluency tests (Word 
Prefixes, Word Listing, and Anagram Flu- 
ency) as well as from the Ideational Fluency, 
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Uses, and Consequences tests. The factor 
is similar to the dimension previously de-; 
fined as “General Fluency” (Horn & Bram. 
ble, 1967; Horn & Cattell, 1966; Rossman & 
Horn, 1972), but the emphasis here is on 
listing tasks with few restrictions imposed in| 
scoring (see Rossman & Horn, 1972). 

Factor intercorrelations. The obtained 
factor intercorrelations for the blind visual 
solution are presented in Table 3; in Table 
4, the factor intercorrelations for several 
oblique solutions are presented along with 
the corresponding correlations from the 
study on 10- to 11-year-old children (Un- 
dheim, 1976). Although the different rota- 
tional solutions met the criteria for simple 
structure about equally well in each set of 
data, factor correlations are sometimes quite 
different. In particular, one notes the dif- 
ferences concerning the relationship of the 
Gf factor to the other broad factors, reflect: 
ing the broadness of the Gf factor in the 
different solutions. In the direct oblimin 
solution, increasing the parameter delta (i 
tually decreased all factor correlations with 
Gf, while all other factor correlations in- 
creased, particularly so among factors Gc, 
Gr, and Gv. 

Assuming that the most valid comparison 
between the two studies should be done 
across rotational methods, that is, assuming 
no interaction of method with number of 
variables, or factors, or so on (see, e.g: 
Hakstian, 1971; Hakstian & Abell, 1974), the 
three methods provide no support for an 
increasing separation of Gf and Gc. There 
is also no systematic change in the relation- 
ship of Gv with Gf and Gc, but Gs has sig 
nificantly (p < .01) lower correlations with 
Ge. The results are in agreement with the 
hypotheses made. 

In the study of 10 to 11 year olds, no flu- 
ency variables were included. In Table 4; 
the numbers in parentheses are the factot 
correlations obtained when the variables 
originally hypothesized to define a broad 
Fluency factor were deleted from the present 
analysis and four factors were extracted and 
rotated. The conclusion regarding the de- 
velopment of a distinction between Gf an 
Ge would seem to hold, but a progressivé 
differentiation of Gs from Gc is supported 
<.05) in only one of the factor solutions. 
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‘Table 4 
' Intercorrelations of Broad Intelligence Factors in 10- to 11- 
ero ed Children 1-Year-Old? and 12- to 
Gf Gc Gv G 
Broad factor 10-11 12-13 10-11 12-13 10-11 12-13 10-11 osu 
Fluid Intelligence 
Visual solution 
Direct oblimin (delta — 0) — — 
Promax-varimax (p = 5) — DE 
Crystallized Intelligence 
Visual solution 64 64 — — 
pue oblimin (delta = 0) 36 38 (35) — — 
romax-varimax (p = 5) 70 56 -= — 
Broad Visualization a 
Visual solution 49 56 54 62 
pres oblimin (delta = 0) 26 36 (26) 60 46 (59) — zii 
romax-varimax (p — 5) 56 52 (54 = = 
Broad Speediness ep. s ae 
isual solution 52 42 56 27 41 
i utio 24 — Ce 
pre oblimin (delta = 0) 25 28(22) 53 20(44) 37 20 (33) — — 
romax-varimax (p = 5) 59 54(49) 68 38(49) 46 29 (46) — = 


Note. Numbers in parentheses are factor 


Qv = broad Visualization, and Gs = broad Speediness. 


i correlations obtained when Fluency 
four factors were extracted and rotated. Decimal points have been omitted. Gf = 


measures were excluded from the analysis and 
Fluid Intelligence, Ge = Crystallized Intelligence, 


“Based on Undheim (1976) and reanalysis of data presented there. 


It is difficult to argue from a finding of 
factor intercorrelation whether a hypothesis 
issupported. Although the two studies are 
comparable in terms of the sampling of 
subjects and the multivariate methods used, 
differences in the sampling of variables may 
have influenced the relationship among the 
factors. 

_ The difficulty of interpretation and the 
inconsistency that may arise when using 
ps methods of comparison is illustrated 
: 4 considering the correlations between es- 
eee factor scores and average correla- 
Y ns between tests defining broad factors. 
iei scores were estimated by summing 
oium scores of salient loaders (see 
fact witz & Horn, 1971) in the blind visual 
d x solutions of the two studies. The 
pus ations between Gs factor scores and the 
di ee other broad factors are in the expected 
direction, but the differences are not signif- 
leant. 
Scores of Gf, Gc, and Gv are higher among 12 
ps olds than among 10 year olds, but none 
e the differences are significant. The av- 
s correlation among measures defining 
e are .67 in the present study as compared 

ith .51 in the study of 10 year olds. The 


The correlations between factor 


corresponding correlations of the other 
broad factors are not significantly different, 
but the variables defining Gf also correlate 
somewhat higher in the present study (.50 vs. 
A1). The test variables of the two studies 
have about equal reliabilities. 
Factor score estimates, then, only give 
slight support for the hypothesis of a Gc-Gs 
differentiation and, if anything, would 
suggest an integration of Verbal-Educa- 
tional performance as well as between Gf, 
Gc, and Gv. Also, the differentiation hy- 
pothesis is not supported by a direct analysis 
of correlations among overlapping measures 
for Gc and Gs. The average correlations 
among two tests on Gc, among four tests on 
Gs, and between Gc and Gs tasks are not 
significantly different in the two studies. 


Discussion 


The five broad factors that appeared 
across several rotational methods had load- 
ns very much in accordance with 


ing patter u t 
the hypothesis of factors representing Fluid 
and Crystallized Intelligence as well as of 


broad factors of Speediness, Visualization, 
and Fluency. The present study as well as 
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a study of 10 to 11 year olds (Undheim, 1976) 
indicates, however, that the broad factors 
and the Fluid and Crystallized Intelligence 
factors in particular are not well differen- 
tiated in school-age children, as evidenced 
by factor structure matrices, factor correla- 
tions, and factor score correlations. 

As there is some uncertainty in the writ- 

ings of Cattell and Horn about the implica- 
tions of their theory for the ability structure 
in children, the results of the present study 
may be seen as providing only qualified 
support for the Cattell-Horn theory. In any 
case, the present study as well as the earlier 
study of 10- to 11-year-old children (Un- 
dheim, 1976) indicate that Gf and Gc may be 
differentiated, albeit with some difficulty, 
without differential exposure to formal ed- 
ucation, for example, length of schooling. 
"This does not preclude, of course, that length 
of schooling may be a contributing factor in 
the differentiation of Gf and Gc in adults. 
The two studies should weaken the hy- 
pothesis that the Gc factor is a spurious ef- 
fect of age maturation, as each study was 
confined to a single grade level, and age at 
each grade level was, if anything, negatively 
related to performance (see Undheim, 1976). 
A third possibility, that Gc simply represents 
academic or formal achievement, is not likely 
considering the study by Crawford and 
Nirmal (1976), although this may or may not 
hold when youngsters are out of compulsory 
schooling. If Gc is not simply academic 
achievement, at least not in a narrow sense, 
Gf is probably not a new name for general 
intelligence. Although distinct from aca- 
demic achievement, then, the Crystallized 
Intelligence factor is more closely related to 
such achievement than is the Fluid Intelli- 
gence factor (Crawford & Nirmal, 1976). 

There remains, of course, a number of al- 
ternatives to the Cattell (1971) formulation 
of Fluid and Crystallized Intelligence. Fluid 
Intelligence is supposed to represent physi- 
ological-environmental influences as op- 
posed to the educational-experimental in- 
fluences supposedly tapped by Gc tasks. 
Performance on Gf tasks may be strongly 
influenced by strategies and concepts ac- 
quired in formal educational settings as well 
as by those learned in more informal intel- 
ligence-demanding situations. Also, al- 
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though Gf appears to be a rather broad fa 
tor in several studies, the quartimax roi 
of the present data shows a very broad! 
telligence factor comprised of both Gf 
Gc tasks and a much smaller factor 
secondary loadings from some Gf 
The quartimax rotation did not meet criteri 
for simple structure as well as a varimé 
rotation, but it indicates the possibility of € 
representing a more narrow factor. TheG 
could, for instance, represent a “‘blown- 
inductive reasoning factor. Another possi 
bility is that the Gf-Gc distinction could} 
the result of a differential self-instructe 
“stop-and-think” strategy. In any case; G 
tasks do seem to represent more in-tet 
concentration and problem solving relativ 
to Gc tasks, the latter characterized moreb 
retrieval and application of general knowl 
edge. 
The present study points to the need fo 
specific hypotheses rather than sweepin 
generalizations regarding the question 0 
ability differentiation. Data analysis shoul 
also illustrate rather well the paradoxes an 
inconsistencies that characterize the ey 
dence for or against the hypothesis. A 
yses of simple-structure factor solution 
would tentatively support the hypothesis 
a differentiation of a broad Speed facto 
from other broad factors of intelligence, bu 
otherwise little or no changes of factor cof 
relations would be supported. Estimated 
factor score correlations and direct anal; 
of average correlations among test variable 
however, did not seem to give much suppol 
for the hypotheses. A proper consideratio! 
of the concept of variables and factors (€.8% 
Cattell, 1966) may possibly reveal that thi 
inconsistency is more apparent than real, bu 
such considerations should probably awali 
replication and extension of the presen! 
findings. 
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Automaticity of Word Recognition Under Phoni 
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. Carl Spring 
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Adult subjects learned to read words spelled with novel letters under phonics 
or whole-word conditions. Training was carried through several overlearning 
trials, and vocalization latency of word recognition responses was measured 
On initial overlearning trials, vocalizations of phonics subjects were faster 
than those of whole-word subjects. At no stage of overlearning were whole- 


as overlearning progressed at 


whole-word subjects but did not reac! 


that whole-word instruction resulted 


Effects of phonics and whole-word 
training on accuracy of word recognition 
have been studied in the laboratory by 
Bishop (1964) and by Jeffrey and Samuels 
(1967), In these studies, subjects were 
trained by phonics or whole-word methods 
to read a list of words written with nonsense 
letters. In a transfer task, subjects were 
| then taught to read a new list of words com- 
| posed of the same nonsense letters used in 
the training task. In each of these studies, 
performance on the transfer task was mea- 
sured by an accuracy criterion. As might be 
expected, only phonics training resulted in 
appreciable transfer. 

Laberge and Samuels (1974), however, 
have discussed the importance of carrying 
word recognition training beyond accuracy 
to a criterion of automaticity. According to 
their „model, automatic word recognition 
permits the reader to devote full attention to 
Syntactic and semantic processes involved in 
comprehension. As noted, previous studies 
compared phonics and whole-word methods 
Using accuracy criteria. The present study, 

owever, uses an automaticity criterion. 
Used latency of word vocalization as the 
index of automaticity (Perfetti & Hogaboam, 
1975). Vocalization latency was measured 
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word subjects faster than phonics subjects. Vocalization laten: 
approximately equal rates for 


of vocalization latencies from overlearning trials and a 
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cies declined 
phonics and 
h fully automated speeds. Correlations 
baseline task indicated 
in word recognition mechanisms similar 


to fully automated mechanisms, but phonics instruction did not. 


over a series of training trials that extended 
beyond the trial on which perfect accuracy 
was first achieved. A decline of vocalization 
latency during overlearning trials was in- 
terpreted as evidence that responses were 
becoming more automatic in the sense that 
they required less processing capacity. 
Although response latencies do not change 
much in paired-associate learning before the 
last error, they rapidly decline to an as- 
ptote during overlearning trials (Mill- 
ward, 1964). Since whole-word learning is 
a type of paired-associate learning, similar 
decline was expected under whole-word in- 
struction. I was interested, moreover, in 
determining whether response latencies 
decline in a similar manner under phonics 
instruction, and if so, whether they decline 
at a rate comparable to that obtained under 
whole-word instruction. For some time, 
reading authorities have asserted that pho- 
nics instruction results in painfully slow 


“word calling,” which overloads the reader’s 


short-term memory system (Anderson & 


Dearborn, 1952, pp- 208-209; Smith, 1973, 
If these authorities are correct, we 
phonics instruction, when 
-word instruction, woul 

ponse latencies that do 


overlearning pro- 


gresses. : MIS 
I also wanted to determine which in- 
structional method resulted in word recog- 
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WHOLE-WORD GROUP| PHONICS GROUP 


ha Xo 


SEE SEE 
2 BJ SAY ^ £M SAY 
NI BEE | NIX BEE 
a X BAY | DNA BAY 
Figure 1. Stimulus and response materials for 


whole-word and phonics groups. 


nition mechanisms most similar to those 
employed in skilled reading. I approached 
this question by using young adults as 
subjects, first determining their baseline 
vocalization latencies to common words 
presented in standard orthography. Fol- 
lowing this, the same subjects became “un- 
skilled readers” as they learned under pho- 
nics or whole-word instruction to read the 
same common words spelled with nonsense 
letters. Ireasoned that if word recognition 
mechanisms produced by whole-word in- 
struction are similar to fully automated 
mechanisms, large positive correlations of 
baseline latency with latencies from over- 
learning trials should be found for the 
whole-word group but not for the phonics 
group. In using older subjects, the present 
study was similar to Bishop’s (1964). The 
reader is reminded, however, that generali- 
zations of the results to beginning readers 
must be made with caution, 


Method 
Subjects 


Subjects were 38 volunteers from unde: 

j rgraduate 
educational psychology classes. Subjects were ran- 
domly assigned to phonics and whole-word groups, with 
5 males and 14 females in each group. 


Procedure 


-projection 
between consecutive 
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presentations. Latencies to read each word were re. 
corded to the nearest millisecond with a timer activated 
by a voice-operated trigger. 

Instruction. Following the collection of baseline 
data, phonics or whole-word instruction was given to 
each subject. Subjects in both groups were told that 
they would be learning to read the same four words that 
were used to obtain baseline data, except that each word 
would be spelled with two nonsense letters (novel 
words). Subjects in the whole-word group were shown 
a card, illustrated in Figure 1, which contained their 
novel words and corresponding standard words. The 
nonsense letters consisted of letterlike forms developed: 
by Williams (1969). Whole-word subjects were asked 
to observe that only four nonsense letters were used a 
that the spelling was not regular: “For example, thi 
letter has an /s/ sound, but this different letter also has 


learn these words by sounding them out." It was als 
pointed out that each of the four words had a different? 
first letter and that subjects could make use of this fact 
to simplify the task. Our purpose was to promote 
stimulus selection by encouraging whole-word subje 
to use only one nonsense letter as the functional stim- 
ulus for each word. There is a good deal of evidence 
that stimulus selection is typical in whole-word learning 
by children and adults (Gilbert, Spring, & Sassenrath, | 
1977). 
Subjects in the phonics group were shown a card, il- 
lustrated in Figure 1, that contained their novel words) 
and corresponding standard words. Phonics subject 
were asked to observe that only four nonsense letters 
were used and that the spelling was completely regulat 
“For example, this letter always has an /s/ sound) 
therefore, you should learn to read these words by 
sounding them out.” Subjects in both groups were thêl 
given an additional 15 sec to study the material on the 
cards, 
$ Training. After the instructions, each subject wa | 
given 18 training trials. Since most of subjects learned! 
to read the four words within the first few trials, most] 
of the trials were overlearning trials. Within each of they. 
trials, the four words were randomly ordered, and eachi 
word was visually presented in a test-study formal) 
Subjects were instructed to respond after being show! 
a test display containing a novel word by saying thé} 
word as fast as possible if they knew it. After test disi 
plays were presented for 4 sec, they were replaced by 
study displays containing the novel word that was justi 
tested and its corresponding standard word. Each 
study display was also presented for 4 sec, after which} 
another test-study cycle was repeated with a different} 
word. The 18 training trials were divided into six blocks] 
of three trials, and 30-sec rests were scheduled between 
consecutive blocks, The purpose of these rests was to 
insure the dissipation of reactive inhibition, which 
might otherwise inflate latencies and make it diffict 
to assess the degree to which responses were becoming 
more automatic. Displays were presented on a rear 
Projection screen. Vocalization latencies were recorded 
to the nearest millisecond. The voice-operated trigger 
had been adjusted so that the timer would not be acti- 
vated by the continuous phoneme /s/ in SEE and SAYS 
thus, it was impossible for phonics subjects to activate 
the timer before identifying these words completely- 
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. Spelling test. Following training trials, subjects were 
given a spelling test. The four experimental words were 
dictated, and subjects were required to write each novel 
word using the correct nonsense letters. An example 
of each nonsense letter was provided. Subjects had not 
been told that they would be given a spelling test. 

Questionnaire. Following the spelling test, subjects 
were questioned about the strategy they used during 
training. Phonics subjects were asked, “Did you sound 
out the words on every trial, or did you shift to another 
strategy on later trials?” If they acknowledged 
switching to another strategy, they were asked to de- 
scribe the strategy. Whole-word subjects were asked 
if they learned the words by concentrating on one or 
both of the nonsense letters composing each word. If 
they indicated that they concentrated primarily on one 
letter as the functional stimulus, they were asked to 
identify whether it was the first or second letter, and 

Doer they occasionally attended to the other 
letter. 


Results and Discussion 


Questionnaire and Spelling Test 


Although all of the phonics subjects ap- 
proached the learning task by sounding out 
words, 7 phonics subjects (3796) reported 
switching to a whole-word strategy before 
completing the training. Within the 
whole-word group, 3 subjects (1696) consis- 
tently used both the nonsense letters as the 
functional stimulus for each word, 14 
subjects (7496) concentrated primarily on the 
first letter, and 2 subjects (1096) adopted the 
inefficient strategy of inconsistently using 
the first letter in some words and the second 
letter in other words. Thus, most of the 
whole-word subjects (84%) engaged in some 
form of stimulus selection. Of the 14 
subjects who concentrated primarily on the 
first letter, 5 subjects reported also attending 
to the second letter before finishing the 
training. This is in agreement with a report 
by Gilbert et al. (1977) that stimulus selec- 
tion is relaxed during overlearning. 

As might be expected, none of the phonics 
subjects made spelling errors. Nine whole- 
word subjects (47%), however, made at least 
One spelling error. Second-letter errors 
(28%) were considerably more frequent than 
first-letter errors (7%). This serial position 
effect for spelling errors is consistent with 
the fact that whole-word subjects relied 
Primarily on the first letter of a word as its 
functional stimulus. It is important to note, 
however, that three whole-word subjects who 
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made no spelling errors reported that they 
attended to only the first letter throughout 
training. This raises the possibility that 
stimulus selection may be relaxed without 
the subject’s awareness that he or she is at- 
tending to all components of the stimuli. 


Baseline and Training Data 


Although a comparison of phonics and 
whole-word methods on an accuracy crite- 
rion was not a primary objective of this 
study, relevant data were available and were 
analyzed. Using the first completely correct 
trial as the accuracy criterion, we found that 
the variance was significantly smaller for the 
phonics group than for the whole-word 
group, F(18, 18) = 12.7, p < .01. This was 
because all subjects in the phonics group 
achieved criterion by the seventh trial, while 
a few subjects in the whole-word group re- 
quired more trials. In fact, one whole-word 
subject did not achieve criterion until the 
final trial, and another never achieved cri- 
terion. The subject who achieved criterion 
on the final trial, incidentally, was one of the 
two subjects who used an inconsistent 
stimulus selection strategy. Inasmuch as 
variances were unequal, the difference be- 
tween mean trials to criterion was tested 
with a Mann-Whitney nonparametric test 
and was found to be significant (p < .01, 
two-tailed), with the phonics group reaching 
criterion (M = 1.9) before the whole-word 
group (M = 5.7). ji 

While this result supports previous labo- 
ratory studies, it also extends their findings. 
As in the present study, one of the earlier 
laboratory studies used four words spelled 
with four nonsense letters. This study dif- 
fered from the present study, however, by 
using spelling that did not permit whole- 
word subjects to use stimulus selection. 
This resulted in maximally poor learning 
conditions for whole-word subjects (Jeffrey 
& Samuels, 1967). The study had the merit, 
however, of using kindergarten children as 
subjects. The other earlier study, which 
used adults, also favored the phonics method 
by using unfamiliar Arabic words, thus in- 
troducing a difficult response learning task 
for whole-word subjects (Bishop, 1964). 
Response learning, of course, is normally 
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Figure 2. Mean vocalization latencies during over- 
learning trials for whole-word and phonics groups. (C 
is the first completely correct trial, and C + 1, C * 2, and 
80 on are subsequent overlearning trials.) 


trivial in learning to read words in one's na- 
tive language. 
To compare phonics and whole-word 
methods on an automaticity criterion, I 
computed each subject’s vocalization latency 
on the first completely correct trial (C) and 
on each overlearning trial thereafter (C+1, 
C + 2, etc.). The vocalization latency for 
Trial C was defined as the average latency to 
read the four test words in that trial. Since 
there were occasional errors or failures to 
respond on trials following the first correct 
trial, each subject's average vocalization la- 
tency for succeeding trials was computed 
from correct responses only. Group means 
were then computed for Trials C, C + 1, and 
so on. These data are shown in Figure 2. 
Inasmuch as Trial C occurred at different 
points during training for different subjects, 
the number of subjects represented at each 
point in Figure 2 decreases as overlearning 
Progresses. This may be seen in the first two 
columns of Table 1, which list the number of 
phonics and whole-word subjects who com- 
pleted Trials C, C 1, and so on. Because 
phonics Subjects tended to reach Trial C 
before whole-word subjects, the number of 
Subjects who reached any particular over- 
learning trial is always greater for the pho- 
nics group. 


It may be seen in Figure 2 that vocaliza- 
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tion latencies are slightly faster for phonics 
than for whole-word subjects. The rates at 
which vocalization latencies decline, how- 
ever, appear to be similar for the two groups, 
To determine whether vocalization latencies 
were significantly faster for phonics than for 
whole-word subjects, analyses of covariance 
were computed for Trials C, C + 1, and soon, 
In these analyses, the covariate was each 
subject’s average baseline latency. Per- 
centages of variance accounted for by the 
treatment and Baseline X Treatment in- 
teraction are shown in Columns 3 and 4 oft 
Table 1. It may be seen that significant 
treatment effects were obtained on several 
of the earlier trials, indicating faster re- 
sponses from phonics subjects; however, this 
source of variance became less important as 
overlearning progressed. Several significant 
Baseline X Treatment interactions were also 
obtained, indicating that on some trials, the 
analysis of covariance assumption of parallel 
slopes was untenable. In these cases, further 
analysis revealed that the advantage of 
phonics over whole-word instruction was 
greater for subjects with slow baseline Te- 
sponses than for subjects with fast baseline 
responses. This source of variance also be- 
came less important, however, as over- 
learning progressed. e 

Correlations of baseline and training 
latencies are shown in Columns 5 and 6 of 
Table 1. The significant correlations ob- 
tained for whole-word subjects at practically 
every stage of overlearning suggest that 
whole-word instruction results in mecha- 
nisms similar to those employed in fully 
automated word recognition. Failure to 
obtain significant correlations for phonics 
Subjects, on the other hand, suggests that 
there is little similarity between fully auto- 
mated recognition mechanisms and those 
resulting from phonics instruction. i 

It might be expected that phonics subjects 
would shift to a whole-word strategy as 0V- 
erlearning progressed. The pattern of cor- 
relations in Table lsuggests that such a shift 
is required before word recognition becomes 

ully automated. In fact, however, only 7 of 

19 phonics subjects reported shifting to 4 
whole-word Strategy; and it may be seen in 
Column 5 of Table 1 that such a shift is not 
reflected by a corresponding shift to larger ; 
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Table 1 
Analyses From the Overlearning Trials 


No. subjects % variance Baseline and training 
Trial Phonics Whole word Treatment trend Phoni mis 
3 " i. za onics Whole word 
eu i P w E 
E. ss 17 .15* 12% .04 ind 
E E 17 .13* 22** -43 .66** 
hs n " p ME 
E Ah he Bl mt 
E y ; m 16 .14* 12% -.07 js 
P in 16 07 40 04 .60** 
E t: 16 .03 07 A3 43* 
19 15 .08 .14* -.07 49" 
4 4 10 19 13 01 404 —06 fi 
E. "n 19 13 01 02 —.05 aT 
E. 18 13 .06 01 12 29 
E y 18 13 03 01 25 ,49* 
B: " 12 00 01 .25 .62** 
Eros 7 11 03 02 24 42 
p» 1 1 .00 age 3 .61* 
Eoi —— 10 4 01 02 46 is 


Note. Cis : 
is the first completely correct trial, and C * 1, C + 2, and so on are subsequent overlearning trials. 


* 
p € .05 (two-tailed tes 

1 st). 

* p < .01 (two-tailed test). 


training. In fact, the opposite was true. 


ection as overlearning progressed. 
b fact that only seven subjects repo 
E, ing to a whole-word strategy is probably 
P o the extreme difficulty of learning the 
: Gy oe used with the phonics group by 
er on approach. The reader will 
E: at the novel words used with phonics 
^ Trin not permit stimulus selection. 
5 la also be noted that mean vocaliza- 
n latencies did not drop below 800 msec 


Many more trials are evidently needed for 


fully automated latencies. The expected 


shift from a phonics to a whole-word strategy 


Would pre i i 
tional Leer s occur during these addi- 
Conclusions 


E Contrary to expectation, whole-word 
ining did not result in more automatic 


r 5 : 
esponses during overlearning than phonics 


traini : 
raining latencies to decline to the level of 


Responses of subjects in the phonics group 
were significantly faster during early over- 
learning trials. This difference was espe- 
cially evident with subjects who are normally 
slow word recognizers. It would be inap- 
propriate to conclude, however, that phonics 
instruction generally leads to faster word 

onses than whole-word in- 


recognition resp! 
struction, even if the conclusion were re- 
to early overlearning trials. It is 


duri Pan l 

E training. The mean baseline latency stricted 
all subjects, however, was 531 msec. conceivable that the advantage of the pho- 
nics group might not have been obtained if 


pelled or longer words had 
ts had been less skilled 
in phonics analysis. Also, for whole-word 
subjects who selected a single letter as the 
functional stimulus, there was a neighboring 
letter that was connected to an incompatible 
response. This might have increased vo- 


calization latencies, although it is hard to see 
how the effect could be avoided in whole- 
word learning. In any case, a more conser. 
vative conclusion is that within the range of 


less regularly sı 
been used, or if subjec 
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overlearning tested in the present study, 
whole-word instruction does not necessarily 
result in more automatic word recognition 
responses than phonics instruction. Not 
only was the expected speed advantage of 
whole-word instruction not obtained on any 
overlearning trial, but the rate at which 
latencies declined as overlearning progressed 
was very similar for the two groups. 
Although whole-word and phonics 
training may produce recognition responses 
of comparable automaticity on any given 
overlearning trial, they do not necessarily 
result in similar underlying recognition 
mechanisms. Correlational evidence 
suggests that within the range of overlearn- 
ing tested in the present study, the two 
methods resulted in dissimilar underlying 
mechanisms. Whole-word training resulted 
in a recognition process similar to that em- 
` ployed at a much later and more complete 
stage of automaticity, while phonics training 
did not. It is possible, of course, that dif- 
ferent correlational patterns might have 


been obtained if different words had been 
used. 
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Task Structure Versus Modality of Presentation: A Study 
of the Construct Validity of the Visual-Aural 
Digit Span Test 


Joseph K. Torgesen, Charl 


es Bowen, and Charles Ivey 


Florida State University 


This study was designed 
modality of presentation, 
differences between good an 


Test. Fourth-grade good and poor rea 
tasks in which task structure (simultan 
its) and modality of presentation (visual 


cally, The results suggested that task 
determining individual differences 
tion. The relationship of these res 


nostic-prescriptive approach to rem 


In a series of recent studies, Koppitz 
(1970, 1973, 1975) has reported the devel- 
opment of a measure of sequential memory 
skills that appears able to differentiate be- 
tween normal and reading disabled children 
of equal intelligence. This task, called the 
Visual-Aural Digit Span Test (VADS), is 
composed of four subtests (Visual-Oral, 
Visual-Written, Aural-Oral, and Aural- 
Written) that assess ability to recall se- 
quences of digits when the presentation and 
response modes are varied systematically. 

In a preliminary study with this instru- 
ment, Koppitz (1973) found that fourth- 
grade learning disabled children performed 
deficiently on three of the four subtests when 
compared to a group of normal control chil- 
dren. A follow-up to this first study (Kop- 
pitz, 1975) tested the performance of two 
Separate groups of fourth-grade learning 
disabled children in comparison with a nor- 
mal control group. All groups were matched 
on general intelligence, and the two learning 
disabled groups were differentiated by 
reading skill. The results showed a clear 
relationship between VADS test perfor- 
mance and reading skill. The good and poor 
readers among the learning disabled children 
differed on three of the subtests (Visual- 
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to assess which of two variables, task structure or 
is most important in accounting for performance 
d poor readers on the Visual-Aural Digit Span 


ders were given six different digit span 
eous vs. sequential presentation of dig- 
] vs. auditory) were varied systemati- 


structure is a more powerful variable in 


on this test than is modality of presenta- 
ults to the basic assumptions of the diag- 
ediation is discussed. 


Oral, Visual-Written, and Aural-Oral), and 
the poor readers performed differently than 
the control children on all four subtests. In 
both of these studies, the tests involving vi- 
sual presentation of digits showed the most 
consistent relationship to reading and school 
learning ability. 
How are these results to be interpreted in 
terms of the psychological processes involved 
in performance on the VADS test? Koppitz 
(1975) considers the test to be a measure of 
short-term memory and intra- and inter- 
sensory integration. The particularly strong 
relationship of the Visual-Oral subtest to 
reading skill was explained by pointing out 
that both reading and performance on this 
subtest involve “visual perception of printed 
symbols and the oral recall of what has been 
perceived” (Koppitz, 1975, p. 156). How- 
ever, differences in the structure of the visual 
and aural subtests suggest an alternative 
interpretive hypothesis. i 
atey; modality of presentation 
is completely confounded with an important 
structural variable: whether the digits are 


nted simultaneously or in sequence. 
me the entire se- 


10 sec before 


ond. a 
test, Koppitz (19 
tended to use verba 


70) note! 


] rehearsal and various 


$00.75 
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organizational activities more when the 
digits were presented visually than orally. 
In her analysis, the more efficient use of 
these strategies for remembering was closely 
tied to improvement with age in recall scores. 
Thus, modality of presentation may not be 
the most important variable in determining 
differences in recall between good and poor 
readers. Rather, it may be the good readers’ 
more active and efficient use of organiza- 
tional and rehearsal strategies on the visual 
tasks that contributes to the reliable differ- 
ences between good and poor readers on 
these tasks. 

This latter interpretation is consistent 
with the results of several recent studies that 
have employed an information processing 
approach to the investigation of memory 
deficits in poor readers. These studies in- 
dicate that the problems that poor readers 
experience on some memory tasks are the 
result of their failure to adapt in active, or- 
ganized, and planful ways to the information 
processing demands of the tasks. For ex- 
ample, both second- and fourth-grade 
reading disabled children have been shown 
to be deficient in the use of such basic mne- 
monic strategies as verbal labeling and re- 
hearsal (Tarver, Hallahan, & Kauffman, 

1976; T'orgesen & Goldman, 1977), and other 
research (Torgesen, 1977a) has also found 
that fourth-grade poor readers are less likely 
than normal children to use organizational 
Strategies to aid recall. In contrast to the 
consistently obtained performance differ- 
ences between good and poor readers on 
tasks that require the use of mnemonic 
Strategies for efficient performance is the 
failure to find such differences on tasks for 
which there are no readily available strate- 
gies to aid performance (Perfetti & Gold- 
man, 1976). Taken together, the above 
outlined research suggests that any memory 
task that requires relatively greater use of 
efficient mnemonic strategies for good per- 
formance will be more sensitive to psycho- 
logical differences between good and poor 
readers. 

The present study was designed to assess 
which of two variables, task structure or 
modality of presentation, is most important 
In accounting for performance differences 
between good and poor readers on the VADS 
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test. In all, six subtests were given to two 
groups of fourth-grade children who differed“ 
in reading skill. Four of the subtests were 
taken from the VADS test, while the other 
two, involving visual/sequential presentation 
of digits, were developed specifically for this 
study. If, as proposed here, task structure 
is most important in accounting for perfor- 
mance differences between good and poor 
readers, the two visual tasks in which the | 
digits are presented simultaneously should 
be most sensitive to differences between 
children in the two reading groups. On the 
other hand, if the difficulties of reading | 
disabled children are most clearly related to 
the modality in which digits are presented, 
the four subtests in which the digits are 
presented visually should differentiate the 
reading groups more effectively than those 
on which the digits are presented aurally. 


Method 


Subject Selection 


Subjects were selected to be as comparable as possible 
to those used in earlier research (Koppitz, 1975) with 
the VADS test. A group of 60 middle-class Caucasian 
boys, who had previously been identified by their 
teachers as good and poor readers, were screened for 
normal intelligence through the use of the Institute for 
Personality and Ability Testing (IPAT) Culture Fair 
Intelligence Test (Cattell & Cattell, 1960). The chil- 
dren were administered this test, which is entirely 
nonverbal, in small groups of 10 to 15 subjects each. 
Both forms of the IPAT were administered in separate 
sessions, and an IQ score was derived by computing the 
average of the scores obtained on both forms. Only 
those children obtaining an intelligence estimate falling 
within an average range (90 to 120) were included in the 
finalsample. Children were categorized as poor readers 
if their grade level score on the Wide Range Achieve- | 
ment Test (Jastak, Bijou, & Jastak, 1975) was 3.5 (all 
children were tested at mid-year) or below, and they had | 
been named by their teachers as having reading prob- 
lems, Children with uncorrected visual or hearing 
impairments were excluded from the sample as were 
those who showed clinical evidence of emotional im- 1 
pairment or organic brain damage. 

The means and standard deviations of intelligence | 
test score, reading grade level, and chronological age of 
the poor readers were as follows: For IQ, M = 1024, SD 
= 7-1; for reading grade level, M = 2.5, SD = .7; and for 
chronological age, M = 119, SD = 4.1, The corre- 
sponding means and standard deviations for the good 
readers were the following: For IQ, M = 103.4, SD = 
6.1; for reading grade level, M = 6.3, SD = 1.0; and for 
chronological age, M = 112, SD = 5.5. The reading 


disabled children in the present sample were not as se- 
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Table 1 
Maximum Number of Digits Recalled by Good and Poor Readers 
E. xg x Subtest 
ading uditory- uditory- Visual- Visual- Visual/Sequential- Visual/S ial- 
group Oral Written Oral Written Oral ; TEET 
Good 
M 4.9 5.1 6.7 6.5 4.5 4.6 
SD 93 1.06 AT 51 1.20 nite) 
Poor 
M 4.7 43 5.6 5.1 41 4.0 
SD .69 59 1.23 19 60 61 
Results 


yerely impaired, in terms of absolute reading level, as 
those studied by Koppitz (1975). However, the actual 
tifference in reading grade level between good and poor 
jeaders was quite comparable to that in the earlier re- 
search. The only pretest measure on which the groups 
were significantly different was reading level, t(32) = 
12.8, p <.001. There were 17 boys in each of the 
teading groups. 


Procedure 


i Each child was tested during one session, which was 
held in either a mobile research trailer or a quiet room 
in a school building. The testing sessions lasted ap- 
proximately 20 to 25 minutes. The subtests, which 
Were given in a different random order to each child, 
included the following: 

Auditory-Oralsubtest. In this test, digits were given 
orally by the examiner at the rate of 1 per second, an 
recall was taken orally. 

r nel Written subtest. This test was the same 
4 rio Ora except that recall was written. 

t isual-Oral subtest. In this test, digits were pre- 

sented in a horizontal array on a 7.6 X 12.7 cm index 

card. The subject viewed the array for 10 sec and then 

recalled the digits orally. 

Bu Written subtest. This test was the same as 
isual-Oral, except that recall was written. 

a Visual/Sequential-Oral subtest. Here, digits were 
b posed sequentially using a flattened tube of paper 
oard with a 2 X 2 em window cut in it. As a strip 0 
pper containing sequences of digits was inserted into 
pe fabulat strip and pulled past the window, the 
$ mbers were presented individually at the rate of 1 per 

econd. Recall was taken orally. 
a Visual/Sequential- Written subtest. This test was 

e same as Visual/Sequential-Oral, except that recall 
Was written. 
vee individual instructions for each task, the 
E lren were presented with sequences of numbers 
ping in length from two to seven digits. Each child 
poen with the shortest sequence, and each successive 
NS was increased in length by one digit. Atask 
n discontinued either when a child failed to recall a 
h P digita in one sequence correctly, or after seven digits 

" been recalled correctly. The score for each task was 
the largest number of digits recalled correctly. 


The mean scores for both reading groups 
on all six tasks are presented in Table 1. A 
repeated measures analysis of variance 
showed that both group, F(1, 32) = 21.7,p < 
001, and task, F(5, 160) = 40.0, p < .001, 
effects were significant. In addition, there 
was a significant Group X Task interaction 
effect, F(5, 160) = 4.1, p «01. In order to 
help understand the Group X Task interac- 
tion, individual contrasts between reading 
groups for each task were made by means of 
the Newman-Keuls test. This analysis 
showed that differences between groups 
were significant only for the Visual-Oral (p 
< .05) and Visual-Written (p < .01) sub- 
tests. Thus, although the good readers 
scored consistently higher than the poor 
readers, the two visual tasks in which digits 
were presented simultaneously were more 
sensitive to differences between reading 
groups. Although the Visual-Oral and Vi- 
sual- Written tasks were easiest for both 

oups, the good readers apparently were 
able to take greater advantage of the simul- 
taneous mode of presentation than were the 

oor readers. 

It is likely that the current estimate of the 
Group X Task interaction effect is a con- 
servative one because of ceiling effects for 
the good readers on both the Visual-Oral 
and Visual-Written tasks. More good 
readers (7196) than poor readers (4196) had 
a perfect score on one or both of these tasks 
(Fisher's exact test, p = 019). i Thus, if se- 
quences of eight and nine digits had been 
included in the study, it is likely that dif- 
ferences between reading groups would have 
been even greater on the Visual-Oral and 
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Table 2 3 
Intercorrelations Among Subtests | 
Subtest 1 2 3 4 5 6 
Good readers 
1. Auditory-Oral D 
2. Auditory-Written Am ds 
3. Visual/Sequential-Oral f m ye 
4. Visual/Sequential-Written EG us "n BD. 
5. Visual-Oral —.37 ki E E. a 
6. Visual-Written 44 É 1 
Poor readers 
1. Auditory-Oral 
2. Auditory-Written 38 i 
3. Visual/Sequential-Oral .09 Tr P 
4. Visual/Sequential-Written .45* 69 51 E 
5. Visual-Oral 407 B —.01 m E. 
6. Visual-Written 6 .53 .26 d 4 


* p € .05 (one-tailed). 


Visual-Written subtests. There were no 
similar ceiling effects on any of the other 
subtests. 

Another way of testing which of the two 
variables, task structure or modality of pre- 
sentation, is more important in accounting 
for individual differences in performance on 
the VADS test involves an examination of 
the intercorrelations among tasks. For ex- 
ample, if performance tends to be deter- 
mined by task structure, then correlations 
between tasks of similar structure should be 
higher than those of differing structure. On 
the other hand, if modality of presentation 
is an important variable underlying perfor- 
mance across tasks, then within-modality 
correlations should be higher than those 
across modalities. Table 2 contains corre- 
lation matrices computed for each reading 

group separately. 

Since for all of the tasks, structure and 

modality of presentation are completely 
confounded, the clearest analysis of the 
Matrices is provided by comparing rela- 
tionships among tasks within and across one 
dimension while holding the other dimen- 
sion constant. Thus, relationships among 
tasks of similar and different structure are 
compared only for cases in which modality 
of presentation is different. In like manner, 
comparisons with regard to modality of 
presentation are made only between tasks of 


different structure. Table 3 gives the mean 
correlations for each kind of comparison and 
also the number of correlations (of a total 
possible of four) that were significant for that 
particular comparison. Mean correlations 
were computed by using Fisher’s z, trans- 
formation. 1 

The data summarized in Table 3 clearly 
support the view that task structure is an 
important variable underlying individual 
differences in performance on the VADS 
test. The high average correlation among 
tasks of similar structure but different 
modality of presentation suggests that these’ 
tasks assessed similar processing skills in the 
children tested. In contrast, the relatively 


Table 3 ; 
Mean Correlations Among Tasks Classified by, 


Structure and Modality of Presentation E 


Good readers Poor readers 


Relationship among 
tasks 


Same structure, 


different modality — 49* 3 — 44* 3 
Same modality, 

different structure .07 0 -28 1 
Different modality, 

different structure —18 o 25 


* p € .05. 
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low correlation among tasks in which mod- 
ality of presentation was the same indicates 
asmaller influence of this variable on recall 
scores. 


Discussion 


The results of this study provide support 
for the hypothesis that task structure is an 
important variable in accounting for the 
greater sensitivity of the visual subtests of 
the VAD test to psychological differences 
between good and poor readers. That is, the 
consistent differences in performance be- 
tween children of different reading skills on 
the visual subtests do not appear to be due 
to the modality of presentation per se, but to 
the fact that the digits are presented simul- 
taneously. This conclusion follows from the 
fact that the differences between good and 
poor readers on the visual/sequential tasks 
(which were comparable in structure to the 
auditory tasks) were much smaller than 
those obtained when the digits were pre- 
sented simultaneously. The intercorrela- 
tions among tasks also indicate that the 
dimension of task structure contributed 
heavily to the pattern of individual differ- 
ences obtained on the various subtests. The 
tasks that were similar in structure (re- 
gardless of modality) tended to be signifi- 
cantly related to one another, while the 
correlations among tasks in which modality 
of presentation was the same were much 
smaller, 

Our informal observations of the children 
as they prepared for recall on the visual tasks 
in which the digits were presented simulta- 
neously were consistent with those made by 
Koppitz (1970). Many of the children were 
quite active in their use of such mnemonic 
strategies as verbal rehearsal and “chunk- 
ing” of the digits into smaller subgroups. 
Other research (Estes, 1974) has shown that 
both rehearsal and chunking activities are 
also frequently used when subjects are re- 
quired to recall sequentially presented series 
of digits. However, it seems clear that si- 
multaneous presentation of digits would 
provide greater opportunities for the use of 
such strategies in preparing for recall, both 
because of the greater study time involved 

„and because the presentation of digits is es- 
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sentially subject directed. Thus, individual 
differences in the tendency to employ effi- 
cient task strategies probably influenced 
performance on the tasks involving simul- 
taneous presentation of digits more than on 
those tasks in which digits were presented 
sequentially. This last statement must be 
qualified by noting that for sequential tasks, 
the greater the interval between presenta- 
tion of digits, the more opportunity there is 
for the use of such strategies as cumulative 
rehearsal (Siegel & Allik, 1973). In fact, the 
failure to find differences between good and 
poor readers on the tasks on which digits i 
were presented sequentially may have been 
due to the relatively short interstimulus in- 
terval used on these tasks and not to factors 
unique to sequential presentation per se. 
The results reported in this study are not 
considered to be inconsistent with other re- 
search that has found differences between 
good and poor readers on digit span tasks in 
which the digits were presented sequentially. 
For example, Senf and Freundle (Note 1) 
conducted a well-controlled study designed 
to measure the performance of fourth-grade 
good and poor readers on both aurally and 
visually presented digit span tasks. This 
research showed reliable differences in per- 
formance between groups on both the visual 
and auditory tasks, but the differences were 
not large and were significant only when 
relatively sensitive measures of recall were 
used. The point of this study is not that 
there are no performance differences be- 
tween good and poor readers on sequentially 
presented digit span tasks. In point of fact, 
one might even expect differences between 
good and poor readers on sequentially pre- 
sented tasks to appear for some of the same 
reasons they occur when digits are presented 
simultaneously: The poor readers use less 
efficient information processing strategies. 


What is important in the results reported 


here is that given equivalent measures of 
rtunity 


recall, tasks that provide greater opportunit 
for certain kinds of goal-directed activity in 
preparation for recall appear to be more 
sensitive to differences between good and 


oor readers. s 
s The implications of this study for the de- 


i i ders 
velopment of techniques to aid poor reat 
to jen more effectively are clouded by two 
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factors. First, it may be true that the dif- 
ferences between the good and poor readers 
on the Visual-Oral and Visual-Written 
subtests were due to above-average perfor- 
mance by the good readers rather than de- 
ficient performance by the poor readers. In 
other words, it may be true that a sample of 
children reading at grade level rather than 
above grade level would not perform better 
on these tasks than the poor readers. Sec- 
ond, the data are essentially correlational in 
nature. Thus, they do not establish that 
deficiencies in the skills measured by the 
Visual-Oral and Visual-Written tasks are 
the cause of the poor readers' failure to learn 
to read well. Apart from these caveats, 
however, the results of this study do have 
important implications for techniques that 
are commonly used in the diagnosis and re- 
mediation of learning problems in chil- 
dren. 

Most importantly, the conclusions of this 

study indicate the complexities involved in 
inferring deficiencies in specific psycholog- 
ical processes on the basis of. performance on 
agiventask. Since many approaches to the 
remediation of learning problems involve 
training to ameliorate cognitive deficiencies 
that are inferred from psychological testing, 
it is important to understand clearly the 
processes that underlie performance on 
psychometric instruments. For example, 
the theoretical constructs used in this study 
to explain the poor performance of reading 
disabled children on the visual subtests of 
the VADS test may have quite different re- 
medial implications than those Proposed by 
Koppitz (1975). Ifreading disabled children 
are generally deficient in visual information 
processing, or specifically deficient in the 
“visual perception of printed symbols and 
the oral recall of what has been Perceived” 
(Koppitz, 1975, p. 156), this implies a variety 
of remedial procedures focused specifically 
on visual processing tasks. 

However, if reading disabled children 
perform poorly on tasks like the visual sub- 
tests used in this study because of their rel- 
ative failure to approach such tasks in an 
active, organized, and planful manner 
(Torgesen, 1977b), then other remedial 
procedures are indicated. Such procedures 
would not concentrate on visual tasks per se 
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but would seek to improve the reading dis- 
abled child's “management” of cognitive 
resources that are themselves basically in- 
tact. Remedial efforts derived from this 
position might focus on the reading disabled 
child's information processing strategies 
themselves, or they might involve restruc- 
turing learning tasks in a way that provides 
more support for efficient information pro- 
cessing and learning. 


Reference Note 


1. Senf, G. M., & Freundle, P. C. Sequential auditory 
and visual memory in learning disabled children. 
Paper presented at the meeting of the American 
Psychological Association, Honolulu, August 1972. 


References 


Cattell, R. B., & Cattell, A. K. S. Handbook for the 
Culture Fair Intelligence Test. Champaign, Ill: 
Institute for Personality and Ability Testing, 1960. 

Estes, W. K. Learning theory and intelligence. 
American Psychologist, 1974, 29, 740—749. 

Jastak, J. F., Bijou, S. W., & Jastak, S.R. Wide Range’ 
Achievement Test (Rev. ed.). Wilmington, Del: 
Guidance Associates, 1975. d 

Koppitz, E. The visual and aural digit span test with 
elementary school children. Journal of Clinical 
Psychology, 1970, 26, 349-353. 

Koppitz, E. Visual aural digit span test performance 
of boys with emotional and learning problems. 
Journal of Clinical Psychology, 1973, 29, 463-466. 

Koppitz, E. Bender Gestalt test, visual aural digit span 
test and reading achievement. Journal of Learning 
Disabilities, 1975, 8, 154-157. 

Perfetti, C. A., & Goldman, S. R. Discourse memory 
and reading comprehension skill. Journal of Verbal 
Learning and Verbal Behavior, 1976, 14, 33-42. 

Siegel, A. W., & Allik, J.P. A developmental study of 
visual and auditory short-term memory. Journal of | 
Verbal Learning and Verbal Behavior, 1973, 12, 
409-418. 

Tarver, S. G., Hallahan, D. P., & Kauffman, J. M. 
Verbal rehearsal and selective attention in children 
with learning disabilities: A developmental lag. 
Journal of Experimental Child Psychology, 1916, 22, 
375-385. 

Torgesen, J. K. Memorization processes in reading 
disabled children, Journal of Educational Psy- 
chology, 1977, 69, 571-578. (a) 
orgesen, J. K. The role of non-specific factors in the 
task performance of learning disabled children: A 
theoretical assessment. Journal of Learning Disa- 
bilities, 1977, 10, 27-34. (b) 

Torgesen, J. K., & Goldman, T. Rehearsal and short- 
term memory in reading disabled children. Child 
Development, 1977, 48, 56-60. 


Received August 8, 1977 8; 


Journal of Educational Psychology 
1978, Vol. 70, No. 4, 457-462 


Evidence for a Selective Storage Mechanism 
in Prose Learning 


James M. Royer, Marcy R. Perkins, and Clifford E. Konold 
University of Massachusetts—Amherst 


Jn a number of recent papers, Royer and 
his associates (Royer & Cable, 1975; Royer 
& Cable, 1976; Royer, Hambleton, & Cado- 
rette, in press; Royer & Perkins, 1977) have 
presented evidence that learning of mean- 
ingful material can be facilitated by relating 
the to-be-learned information to previously 
known information. The interpretation for 
these findings was that information that can 
be related to previously learned material is 
either readily stored in a previously estab- 
lished knowledge structure or, after being 
stored, is readily retrievable when recall is 
requested. In contrast, information that 
cannot be easily related to previously learned 
material is either stored less well or is less 
accessible to retrieval. Implicit in this in- 
'terpretation is the assumption that there is 
a mechanism in the cognitive system that 
directs the storage of incoming information 
to alternative storage locations as a function 
of whether the information can be related to 
previously learned material. The purpose 
of this article is to attempt to provide direct 
evidence for the existence of a selective 
Storage mechanism. 
| Dooling and his associates (Dooling & 
| Christiansen, 1977; Sullin & Dooling, 1974) 


An earlier version of this article was presented at the 
meeting of the American Educational Research Asso- 
pation, New York, April 1977. We would like to thank 

erome Myers for his advice concerning the statistical 
analyses. 

Requests for reprints should be sent to James M. 

oyer, Department of Psychology, University of Mas- 
Sachusetts, Amherst, Massachusetts 01003. 
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This study tested the hypothesis that the same prose passage would be stored 
in several different memory locations as a function of whether the passages 
could be related to previously acquired knowledge. Subjects were told the 
passage was about a famous or fictitious person either before or after reading 
the passage. Subjects in the before-famous-person condition made more false 
Í positive errors to high-thematic sentences in a recognition test than did the re- 

maining groups. These results were interpreted as supporting the hypothesis 
that prior knowledge influences the storage location of prose materials. 


have invented a paradigm relevant to this 
issue. They presented one group of subjects 
with a passage labeled with the name of a 
famous person (e.g, Adolf Hitler) and a 
second group of subjects with the same 
passage labeled with the name of a fictitious 
character. In a subsequent sentence rec- 
ognition task, the subjects were presented 
with a target sentence not contained in the 
original passage (e.g, *He hated the Jews 
and so persecuted them") and were asked to 
identify it as an old or new sentence. Sullin 
and Dooling (1974) found that famous-per- 
son subjects were more likely to make a false 
positive error (saying old when new) to this 
sentence than were fictitious-character 
subjects. Further, they demonstrated that 
the magnitude of this effect increased over 
a 1-week retention interval. In a subsequent 
study, Dooling and Christiaansen (1977) 
demonstrated that the locus of this effect 
was in the storage process rather than being 
due to response bias. 


The results of the studies by Dooling and 


his associates are supportive evidence for the 


existence of a selective storage mechanism. 
Given that response bias has been ruled out, 
a logical interpretation of the results is that 
passage information labeled with a famous 
person's name is integrated into a knowledge 


structure containing prior knowledge about 
At the time of the test, 


ed material is confused 
ial, and the subject is 
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the subject receiving the passage with a fic- 
titious name label is less likely to integrate 
the passage material into a knowledge 
structure containing the target sentence in- 
formation and, therefore, is less likely to 
make a false positive error on the recognition 
test. 

The present study attempts to provide 
even more conclusive evidence for the exis- 
tence of a selective storage mechanism. The 
study involved presenting the same passage 
to three different groups of subjects. Two 
of the groups were presented the passage 
labeled with the name of a famous person 
(e.g., Franklin Roosevelt or Winston Chur- 
chill). A third group was presented the same 
passage labeled with the name of a fictitious 
Character. The prediction was that in a 
subsequent recognition task, the famous- 
person groups would be more likely to make 
false positive errors to sentences having high 
thematic value for the person named than 
would the other groups. 

A second manipulation in the study was 
designed to assure that the predicted effects 
occurred because of selective storage of the 
material rather than because of processes 
activated at the time of retrieval. This 
manipulation involved presenting the person 
labels either at the time of passage presen- 
tation (before) or at the time of the recogni- 
tion test (after). Ifthe predicted effects are 
due to retrieval processes, both the before 
and after groups should exhibit the effects. 
Alternatively, if the effects are due to selec- 


tive storage, only the before grou 
exhibit the effects. group should 


Method. 
Materials 


The two biographical passages (P: 
t p assages 1 
respectively) used in the study are the eoni E 


— experienced a fairly normal and adjusted child- 
hood. This would later be reflected in his nies 
hood and thus his political career. Due to his fami- 
ly’s background and prominence, he was usuall; 
provided with what he needed to develop as a Punk 
boy. During these years of. development, he also ie 
ceived an excellent education, Already he was in- 
dicating that he had the necessary characteristics of 
a qualified leader. These qualities and his notable 
interest in his country set him on his way in Politics. 
Later, he was selected for leadership in his country 
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at a time when a strong leader was desperately 
needed. During his term in office, he influenced in- 
ternational as well as national affairs. His country 
was involved in one of the great wars of the century, 
His country did not stand alone during the war. An 
alliance of nations was formed to fight the opposing 
forces. At the end of the war, his country emerged 
victorious. He would go down in history as one of 
the great political leaders of the century. 


— experienced a maladjusted childhood. This 
would later be reflected in his adulthood and thus 
his political career. His educational background 
was sparse, and his environment at this time was 
lacking in the essentials for proper growth. During 
these years, he acquired many attitudes that would! 
later surface in his rise to power. Already he was 
indicating that he had the motivation of a dictator, 
This motivation and notable interest in his coun- 
try’s situation set him on his way in politics. Later, 
he assumed power in his country prior to the begin- 
ning of the great war. He was a ruthless dictator, 
who quickly eliminated all opposition to his rule. 
While in power, he and his party controlled all as- 
pects of life in his country. During the war, his 
country was invaded by the opposing forces. His 
country suffered greatly during the war. It would 
take years after the war to completely rebuild. He 
would go down in history as a ruthless, power-hun- 
gry man. 


The passages were developed in the following man- 
ner. An initial version of the passages was written from 
intuition with the idea in mind of making the passages 
as general as possible. "The passages were then provided 
with a famous-person label (Roosevelt or Churchill for 
Passage 1 and Hitler or Stalin for Passage 2) and wert 
given to small groups of subjects, who were asked to rate 
each sentence in the passage on a 1- to 5-point scale. A 
rating of 1 indicated that the sentence was not appli- 
cable to the person named; a rating of 2 indicated the 
Sentence was slightly applicable; 3, moderately appli- 
cable; 4, very applicable; and 5, highly applicable. Each 
subject received a passage labeled with the name of only 
one of the persons, and any sentence that did not receive 
a rating of 4 or 5 from all of the subjects was revised. 
The revised versions of the passages were then tried out 
with different subjects. This tryout and revision cycle 
continued until all of the subjects gave every sentence 
a 4 or 5 rating. ; 

The 13-item recognition test was comprised of 7 “old” 
sentences, 4 “target” distractors, and 2 “obvious” dis- 
tractors. The 7 old sentences were the last 7 sentences 
from the original passages, The 4 target distractors 
were developed by first having pilot subjects write down 
as many outstanding attributes for each of the four fa: 
mous persons as they could. Target sentences designed 
to capture the gist of the most frequently mention 
attributes were then written. Finally, the two obvious 
distractors were sentences containing information in- 
applicable to any of the four famous persons. 

A full-scale pilot experiment was run after the ma 
terials had been developed. This pilot study involv’ 
Interviewing the subjects upon completion of the eX 
periment and resulted in revising two of the target dis- 


tractor sentences. For example, the original Hitler 
distractor sentence was, “He was responsible for the 
murder of millions of Jews." Virtually none of the pilot 
subjects made a false positive error to this sentence. 
Upon questioning, the subjects indicated that they were 
certain that the word murder did not appear in the 
original passage. The two distractor sentences judged 
to be faulty on the basis of subject comments (the other 
sentence was about Stalin) were edited to remove vivid 
language. "The final versions of the distractor sentences 
are given below. 


Hitler: He was responsible for the persecution of 
minority groups in his country. 


Stalin; While in power, he sent many people to 
labor camps in the north. 


Roosevelt: Despite his great physical handicap, he 
excelled as a political leader. 


Churchill: His brilliant oratory and unyielding will 
inspired his countrymen during their darkest hour. 


Obvious distractors: (a) He devoted the remaining 
years of his life to improving education around the 
world. (b) Before becoming a politician, he was an 
internationally known scientist. 


Design and Subjects 


The experimental design was a 2 X 6 X 2 (Type of 
Passage [Passage 1 or 2] X Passage Label [Adolph Hi- 
tler, Joseph Stalin, Gustav Stark or Winston Churchill, 
Franklin Roosevelt, Henry Rutherford] X Label Posi- 
tion [before or after]) design, with passage label nested 
within passage type. A total of 156 undergraduate 
students (13 in each group) served as subjects and re- 
ceived course credit for participating. 


Procedure 


The subjects were run in groups ranging in size from 
1to18. The subjects were assigned to groups by ran- 
domly arranging the envelopes containing the experi- 
mental materials prior to the experiment. Upon ar- 
riving for the experiment, the subjects were instructed 
to read a short paragraph on the cover page of the pas- 
sage, which described the experiment as being con- 
cerned with human memory. In addition, this cover 
paragraph identified the passage to be read as being 
concerned with a famous person (for subjects in the 
famous-person-before conditions) or a fictitious person 
(for subjects in the remaining groups). The subjects 
were then given 30 sec to read the experimental passage. 
In the before conditions, the passage began with the 
person's name (famous or fictitious). In the after condi- 
tions, the passage began with the words “our charac- 
ter.” 

After replacing the passage in the envelope, the 
ubjects were given 8 minutes to complete à 36-item 
ocabulary test, the purpose of which was to discourage 
rehearsal of the passage materials. Following this phase 
of the experiment, the subjects read a short page of in- 
structions pertinent to completing the recognition test. 
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They were informed that their task was to judge a series 
of sentences as being either old or new depending on 
whether or not the sentences had appeared in the orig- 
inal passage. In addition, they were also asked to rate 
their confidence in each judgment on a 1 to 5 scale, with 
1 indicating very little confidence in the judgment and 
5 very high confidence. The final instruction asked the 
subjects to rate each sentence and to not turn back to 
a previous sentence once it had been judged and rated 
for confidence. 

Upon turning the instructions page, the subjects in 
the after conditions read a sentence that said, "The 
passage you read was about —.” Each of the test 
sentences was printed on a separate piece of paper, and 
sentences were arranged in a different random order for 
each of the subjects in the experiment. The subjects 
were given as much time as they desired to complete the 
recognition test. 


Scoring 


The dependent variable in the experiment was a 
number that combined sentence judgment and confi- 
dence in that judgment. For example, a correct judg- 
ment (old when old, or new when new) with a confidence 
rating of 5 was scored as +5. An incorrect judgment 
(new when old, or old when new) with a confidence 
rating of 3 was scored as —3. Thus, the scores ranged 
from —5 to +5 (zero excluded), with the former score 
representing highest confidence in an incorrect judg- 
ment and the latter score indicating highest confidence 


in a correct judgment. 


Results 


Target Sentences 


The mean scores on the four target sen- 
tences and the percentage of false positive 
errors are presented in Table 1. Our pre- 
dictions were that famous-person-before 
groups would score lower on their particular 
target sentence than they would on the other 
target sentences and that the after groups 
would not exhibit this effect. The data in 
Table 1 are generally consistent with these 
predictions. 

The plan of analysis for the target sen- 
tence dependent variable involved sever 
phases. First, we conducted an over 
analysis of variance to determine if our pre- 
dictions were supported at a gross level. 
Following encouraging results at this level, 
we conducted several additional analyses 
that provided a more direct test of our hy- 
potheses. 

The overall analysis was a 2X6X2X4 
(Type of Passage [Passage 1 or 2] X Passage 


aie 
p 


460 J. ROYER, M. PERKINS, AND C. KONOLD b 
Table 1 i 
Mean Ratings for Target Sentences 
Target sentence — | 
er 
fi T : 
P label ^ Hitler — Stalin pen Churchill Hitler — Stalin — Roosevelt Churchill - 
assage lal 
Passage 1 
3 —.54 1:23 
i 3.85 3.31 1.0 1.69 5: 
ag en (9 (0) (8) (81) (23) E" g 
i 3.00 3.62 1.23 2.23 A i 
oe O (8) à) Gn 6D (23) en 
Stark 62 1.85 2.77 2.54 2.15 2.00 E 3) 
d d) ea (5 ® CME Q9 (64 
Passage 2 
31 1.54 
3.77 —.54 1.23 4.46 4.23 le 
io o. (0) (46) (31) (0) (0) Nee e 
2.0 2. 
i „31 4.23 3.38 1.08 4.85 4.54 2. 
ees ra (0) (0) (38) (0) (0) eo oe 
2.23 .08«, 
4.08 3.69 2.85, 245 3.62 4.23 
Up (0) (8) (15) (31) (0) (0) (8) (15) 


Note. The percentages of subjects who made false positive errors on the sentences appear in parentheses. 


Label [Hitler, Stalin, Stark, Churchill, 
Roosevelt, or Rutherford] X Label Position 
[before or after] X Target Sentence [Hitler, 
Stalin, Churchill, or Roosevelt]) analysis of 
variance. The first three factors are be- 
tween-subjects variables (with paragraph 
label nested within paragraph type), whereas 
the fourth is à within-subjects variable. 
Support for our predictions was obtained 
from this analysis in that the interaction 
between the target sentence variable and the 
label position variable was significant, F(3, 
432) = 438, p < 01. Additional significant, 
effects in this analysis were main effects for 
type of passage, F(1, 144) = 15.03, p < .01, 
type of target sentence, F(3, 432) = 7.41, p 
< .01, interaction between type of passage 
and type of target sentence, F(3, 432) = 
16.53, p < .01, and interaction between label 
position and type of passage, F(1, 144) = 
5.89, p < .01. 

A more direct test of our hypotheses was 
provided in an analysis that included only 
the experimental groups. This analysis 
compared performance on the target sen- 
tence with performance on the other target 
sentence similar in tone. The Paragraphs 
used in the study were different in that one 
was about a dictator, whereas the other was 
about a democratically elected leader. 
Likewise, the target sentences were different 


in that the sentences for Hitler and Stalin 
mentioned attributes that were negative in 
tone, whereas the Churchill and Roosevelt 
sentences mentioned more positive attri- 
butes. Since subjects reading a paragraph | 
about a dictator would probably be less likely | 
to make a false positive error on a sentence 
mentioning a positive attribute (and vice | 
versa for the democratic paragraph), we felt | 
that the most conservative test of our hy- 
pothesis would involve a comparison be- 
tween target sentences similar in tone. 
Thus, subject performance on the Hitler 
target sentence after reading the Hitler 
passage was compared with the same 
subject’s performance on the Stalin target 
sentence. Ina similar fashion, comparisons | 
Were made between performance on the 
Stalin target sentence and the Hitler sen- 
tence for the Stalin group, between the 
Roosevelt target sentence and the Churchill 
sentence for the Roosevelt group, and be- 
tween the Churchill target sentence and the 
Roosevelt sentence for the Churchill 
group. 

The design for this analysis was a 2 X 2X 
4 X 2 (Type of Passage [Passage 1 of 2] x 
Label Position [before or after] X Passage. 
Label [Hitler, Stalin, Churchill, or Roosevelt] 
X Target Sentence or Other Seritence) 
analysis of variance. The first ee 1 


P 
ay 
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ere between-subjects variables, with pas- 
age label nested within type of paragraph. 
he target sentence variable was a within- 
subjects variable. Support for our hypoth- 
ses was present in that there was a signifi- 
ant interaction between the target sentence 
yariable and the label position variable, F(1, 
96) = 6.07, p «.05. Simple effects tests for 
this interaction indicated that target sen- 
tence performance (mean = .75) was signif- 
tly lower than other sentence perfor- 
nance (mean = 2.26) in the before label po- 
sition, F(1, 96) = 8.57, p < .01, whereas the 
arget sentence performance (mean = 1.75) 
did not differ from the other sentence per- 
formance (mean = 1.63) in the after label 
osition (F < 1). 

An additional analysis compared perfor- 

ance on the target sentence alone. This 
analysis was a 2 X 2 X 4 (Passage 1 or 2 X 
Before or After Label Position X Hitler, 
Stalin, Roosevelt, or Churchill Paragraph 
Label) analysis of variance, with target sen- 
tence performance as the dependent vari- 
able. This analysis indicated that target 
sentence performance for the before groups 
(mean = .75) did not differ significantly from 
the target sentence performance for the after 
groups (mean = 1.75), F(1, 96) = 2.25, p 
< .15. 

Given that the analysis of the target sen- 
tence alone did not indicate a statistically 
reliable difference between before groups 
and after groups, one further analysis was 
performed. This analysis tested the possi- 
bility that subjects in the after groups, in 
general, responded with less accuracy and/or 
confidence than did subjects in the before 
groups. A 2 X 2 X 4 (Before or After Label 
Position X Passage 1 or 2 X Hitler, Stalin, 
Roosevelt, or Churchill Passage Label) 
analysis of variance, using the mean of the 
three nontarget distractors as the dependent 
variable, indicated that there was a reliable 
difference between before groups (mean = 
3.19) and after groups (mean = 2.43), F(L. 
96) = 4.89, p < .05. 


Control Groups, Old Sentences, and 
Obvious Distractors 


In addition to the major analyses pre- 
sented above, several additional analyses 
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were performed on the data. The first 
compared target sentence performance for 
the two control groups, using a design similar 
to the ones reported earlier. There were no 
significant effects in this analysis. Two 
additional analyses identical in design to the 
overall analysis reported in the target sen- 
tence section utilized the scores for old sen- 
tences and scores on the obvious distractors 
as dependent variables. There were no re- 
liable differences in either of these analy- 
ses. 


Discussion 


The results of the study provide support 
for the existence of a selective storage 
mechanism in the human cognitive system. 
We had hypothesized that subjects receiving 
a famous-passage label prior to reading the 
passage would be likely to make false posi- 
tive errors on sentences having high thematic 
value for the person named. Conversely, we 
hypothesized that passage label would not 
affect target sentence recognition when the 
label was provided at the time of the recog- 
nition test. 

Support for these predictions was present 
in the analysis that compared target sen- 
tence scores with the scores from sentences 
similar in tone. The target sentence scores 
were significantly lower than the similar- 
tone sentence scores in the before-label- 
position groups, but the scores on the target 
and similar-tone sentences did not differ in 
the after-label-position groups. 

Our interpretation for these results is that 
subjects in the before label condition inte- 
grate the paragraph information into a pre- 
viously established knowledge structure 
containing information about the famous 
person. During the recognition test, the 
subjects are likely to retrieve previously 
stored information and are, therefore, likely 
to make false positive errors on high-the- 
matic target sentences. This effect would 
not be expected to be present in the after 
label conditions, since presumably, the 
passage information is not integrated into 
the knowledge structure containing infor- 
mation about the famous person. 

It should be emphasized that the storage 
structures We envision do not consist of static 
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“copies” of sensory events. Rather, we 
suggest that the nature of a storage trace is 
dynamic in quality and is affected by factors 
such as the environmental context in which 
the event is experienced and the nature of 
the knowledge structure into which the event 
is integrated (cf. Royer, 1977; Royer et al., in 
press). 

The results of this study provide support 
for the implicit directed storage assumption 
in the previous research by Royer and his 
associates (e.g., Royer & Cable, 1975; Royer 
& Cable, 1976), which demonstrated that the 
learning of prose materials could be facili- 
tated by relating the materials to previously 
learned information. A remaining issue is 
whether this facilitative effect is due to su- 
perior storage of the presented information 
or superior recall after the information has 
been learned. The available information, 
though not conclusive, suggests that superior 
retrieval is the source of the effect (e.g., 
Royer, Sefkow, & Kropf, 1977). Future re- 
search should be directed at providing evi- 
dence pertinent to this issue. 
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to 63 years, during the resolution 
moral development, as determined by 
to classify subjects responding to the 


Enthusiasm for Kohlberg’s theory of 
moral development has been interrupted 
recently by the realization that few psy- 
chologists or educators understand Kohlberg 
in exactly the same way. Kohlberg's (19712, 
1971b) major explications of this theory 
proposed a complex relationship between 
moral philosophy, developmental psychol- 
ogy, and subject responses to hypothetical 
moral dilemmas, though his attempted in- 
-tegration does not seem to have clarified the 

substantive moral changes that are supposed 
to be occurring between moral development 
Stages. Uncertainty regarding this be- 
tween-stage change is roughly equivalent to 
Uncertainty about what is actually “devel- 
oping.” The need for clarity here was noted 
by Aronfreed (1971), while Alston (1971) and 
Peters (1971) questioned whether Or not 
Kohlberg's theory describes development 
E is specifically moral in type or qual- 

y. 

This article reviews the frequently 
unacknowledged complexity of Kohlberg's 
Position and attempts to locate a clear defi- 


This study is based on a doctoral dissertation sub- 
mitted to Fordham University, Graduate School of 
Education. The author thanks Francis J- Crowley, 
mentor, for generous help of many kinds. 
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TA 37 West 87th Street, New York, New York 


Copyright 1978 bythe American P: 


strument requiring subjects to weight reasons for pursuing à course 
A multiple discriminant analysis indicated that it was not possi! 


guish moral development stages on the basis of optimum 
iables of the decision-making matrix. Results indicate that the 
is informed by a consistent rationality 


sychological Association, Ine 0022-0663/78/ 


Stalking the Missing Link Between Moral 
Development Stages 


Gary R. Ahlskog 
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between Kohlberg's stages of moral de- 
velopment and the process of decision making utilized by 214 subjects, aged 14 
of hypothetical moral dilemmas. Stage of 
the Ethical Reasoning Scale, was used 
Preferential Reasoning Profile, an in- 


of action. 
ible to distin- 
differential weight- 


nition of between-stage change across moral 
Since an adequate 
definition of moral development cannot be 
formulated by pointing to changes in the 


provide empirical verification for Kohlberg’s 
key theoretical claim that the decision- 
making process during the adjudication of 
moral claims changes across moral devel- 
opment stages. Lacking such verification, 
the significance of a moral development 
stage score was thought to remain in 
doubt. 

Kohlberg’s stages of moral development 


do not measure or reflect a simple hierarchy 


of values. A subject’s invocation of concrete 
values would be easy to measure but would 


not support a comprehensive theory of moral 
i higher stage values 


prescription cific, : 
dilemma (cf. Gauthier, cited in Beck, Crit- 
tenden, & Sullivan, 1971, pP. 365-366). 

ed that 


Rather, Kohlberg (1971a) has argu 


f logical operations not present at 
These logical op- 


reflect more and more of the 
f formal reasoning, involving 
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precisely to changes in the content of per- 
sonal values but to changes in the structure 
of the individual's reasoning. “Each value 
is restructured in the course of development” 
(Kohlberg, 1973, p. 5). 

Now Kohlberg (1971b) also contends that 

“the content of moral concerns and claims is 
always welfare” (p. 63), since “there is no 
moral situation that does not involve con- 
siderations of people’s happiness or welfare 
and considerations of equal treatment be- 
tween people" (p. 59). Moral principles are 
born, therefore, from the increasing capacity 
to adjudicate welfare claims in a formally 
ideal way, where more concrete referents, 
such as egocentric sentiments or social 
stereotypes, have lost their persuasiveness 
for the developing individual. A formally 
adequate moral position requires, moreover, 
that there is only one moral principle that 
can resolve competing or conflicting claims 
to welfare, namely, the principle of justice 
(Kohlberg, 1971b, p. 63). Although argu- 
ments about the nature of justice may goon 
at all stages of development, Kohlberg 
(1971b) has proposed that a formally ade- 
quate principle of justice takes the form: 
“Consider each person's welfare equally” 
and “Consider every man’s moral claims 
equally” (p. 64). It has become possible, 
then, for Kohlberg (19712) to conclude that. 
“the basic referent of the term ‘moral’ is a 
type of judgment or a type of decision- 
making process" (p. 214). 

It is not entirely clear how the scoring of 
subject responses to hypothetical moral di- 
lemmas can be said to measure a process. 
Kohlberg (cf. 1971, pp. 195-213) may have 
intended to clarify the relationship between 
content and process by referring to the con- 
cepts of differentiation and equilibrium. 
He theorized that there are discrete Changes 
occurring within the individual that give rise 
to different structural wholes or ways of 
comprehending the world. At Stage 1, for 

example, only relative degrees of power or 
status are differentiable in the individual's 
reasoning. At Stage 2, the individual can 
differentiate quantities of exchange, that is, 
equal or unequal exchanges of reward or 
penalty. By Stage 3, the individual can 
differentiate the actual exchange from an 
ideal exchange and so on for higher stages 
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and capacities to differentiate aspects of a 
moral issue. 

Prior to Stage 6, however, these differen- 
tial capacities continue to lack formal and 
moral adequacy, since the individual's ver- 
sion of justice generates inconsistencies or a 
certain kind of disequilibrium. Perhaps the 


guidelines for law maintainers are not reci- 
procally applied to lawmakers. The idea of 


respect for rules may interfere with the ca- 
pacity to evaluate those rules. An individual 
may hold that a certain moral principle is 
unconditionally valid while simultaneously 
arguing that the application of that principle 
depends on cultural norms—and so on. So 
the individual who has not reached Stage 6 
may attain a partial equilibrium only to have 
it topple over into disequilibrium as new 
differentiations are made. Only at the end 
of the developmental spectrum does a mo- 
rally adequate concept of justice meet the 
psychological capacity to utilize that con- 
cept. “The formal psychological develop- 
mental criteria of differentiation and inte- 
gration, of structural equilibrium, map into 
the formal moral criteria of prescriptiveness 
2 universality" (Kohlberg, 1971a, p. 

4). 

If the foregoing remarks accurately and 
sufficiently summarize the components of 
Kohlberg's theory, then the problem of de- 
fining the nature of between-stage change, 
that is, the problem of defining moral de- 
velopment, can be encapsulated by the fol- 
lowing question: Does a mistaken view of 
the criteria that truly contribute to human 
welfare necessarily imply an underdeveloped 
sense of justice? Words remain ambiguous 
here, even when one tries to answer within 
Kohlberg's own framework. It seems pos- 
sible o define between-stage change by 
Pointing to changes in the individual’s dif- 
ferential view of the criteria that contribute 
to welfare (e.g., from quantitative satisfac- 
tions at Stage 2 to interpersonal harmony at 
Stage 3). These changes, however, do not 
imply corresponding changes in the indi- 
Vidual's sense of fairness or rule reciprocity. 
Not only could increasingly comprehensive 
views of the criteria that contribute to wel- 
fare be succinctly explained as a matter of 
Social learning (Hall & Davis, 1975), but such 
cognitive comprehensiveness would have 


itle to do with an increasing use of and ap- 
reciation for the method of equal consid- 
eration for welfare claims. Sooner or later 
this attempt to define between-stage change 
will degenerate into a theory of values ac- 
quisition, a brand of moral realism that 
Kohlberg himself has consistently avoid- 


It seems possible to approach the question 
m the other side and argue that be- 
tween-stage change reflects an increasingly 
mprehensive understanding of the prin- 
iple of justice. A sense of justice, however, 
is not to be confused with abstract ideas 
bout moral problems. A person may make 
creasingly sophisticated arguments about 
the nature of right and wrong without caring 
bout such issues in the concrete case. 
rson may abhor the imposition of arbitrary 
or inconsistent rules without being able to 
justify the abhorrence in a formally adequate 
way (Peters, 1971, p. 261). It seems need- 
lessly complex and empirically unfounded 
to suggest that a sense of justice as a method 
for adjudicating welfare claims develops in 
proportion to the individual’s ability to give 
formal reasons for employing the method. A 
North American teenager is under no moral 
obligation to justify his or her reasons for 
action in the style of a European intellectual 
weltanschauung (Ausubel, 1971). 

Thus far the idea that psychological cri- 
teria map into moral criteria during devel- 
opment appears to be a desirable but vague 
conclusion that provides little usable infor- 
mation. If Kohlberg is correct in asserting 
that a formally adequate prescription for 
justice takes the form, Consider each per- 
son's welfare and/or moral claims equally," 
then one wants to know how moral devel- 
opment can be differentiated from the in- 
dividual’s developing rationality. Any rule 
for social discourse and any prescription to 
be rational in one’s thinking requires that 
every person’s claims be given equal atten- 
tion or respect, so it becomes all the more 
important to distinguish between apparent 
metaethical principles and exhortations for 
clearheadedness (cf. Hill, 1972). Kohlberg's 
references to differentiation and equilibrium 
May simply point to the fact that new 

nowledge corrects previous mistakes. 
Ausubel (1971) has suggested that an 1m- 
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plicit principle of rationality informs all 
prescription for action and eventually de- 
fines inconsistencies for the agent during the 
agent's own development. No particularly 
moral significance, however, could be at- 
tached to the agent's relative ability to jus- 
tify the importance of rationality or consis- 
tency of thought in response to hypothetical 
dilemmas. Lacking further clarification, it 
would appear that Kohlberg's definition of 
moral development as stages of differentia- 
tion and equilibrium could be used at will to 
construct additional theories of political 
development or religious development, that 
is, to construct developmental theories in 
any arena that a researcher cared to ask 
questions about. 

Having failed to derive'a clear conception 
of between-stage change from the possibili- 
ties noted above, there remains a third al- 
ternative described in Kohlberg's writings 
but in need of empirical verification. This 
alternative suggests that subjects at in- 
creasingly higher stages of moral develop- 
ment would be expected to engage in a de- 
cision-making process (cf. Kohlberg, 197 1a, 
p. 214) that reflects the method of justice in 
an increasingly complete and ideal way. In 
empirical terms, subjects at higher stages 
should rank inadequate or inappropriate 
reasons for action less desirable in resolving 
hypothetical moral dilemmas than would 
subjects at lower stages. When the deci- 
sion-making process is conceptualized as a 
procedure during which criteria thought to 
enhance welfare are evaluated relative to 
criteria thought to harm welfare, then be- 
tween-stage change may be tentatively de- 
fined as a proportional change in the influ- 
ence ascribed to harmful criteria. More 
developed subjects should rule out more 
more of the many possible reasons for action 
on the grounds that, morally speaking, such 
reasons are at least irrelevant if not ulti- 


mately unjust. 


Method 


verify this tentative definition of moral 


In order to 
convenient sample was 


development, & 
i of 22 


in tests and 
the Ethical Reasoning Scale and the 
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soning Profile (discussed below), and to administer both 
instruments to an additional 10 people (N = 242). This 
method of subject selection was intended to generate 
a reasonably wide variety of ages, levels of education, 
and social backgrounds. 

A total of 214 subjects completed both instruments. 
Of this total, 46.7% were male, and 53.3% were female. 
Ages of subjects ranged from 14 to 63 years, with a me- 
dian age of 31 years. Approximately 78% of the subjects 
were under the age of 40 years. The social status level 
of occupations, rated according to Warner (1957), 
showed a modal and median value of 4, corresponding 
to skilled clerics, owners of small businesses, indepen- 
dent skilled technicians, and the like. Nearly 65% of 
the sample had never been married, while an additional 
24% were currently living with a spouse. Only 21% of 
the subjects had not completed some portion of a college 
education, while 13% had completed some type of de- 
gree at the postgraduate level. The majority of subjects 
(78%) had been born within 100 miles of New York City, 
‘while 10.3% had been born in Puerto Rico, and the re- 
mange born in places scattered throughout the United 

tates, 

The first instrument completed by each subject, the 
Ethical Reasoning Scale (Sullivan, Note 1), was in- 
tended to measure each subject’s stage of moral devel- 

opment. This instrument contains 54 forced-choice 
items that are divided into roughly equal proportions 
among three of Kohlberg's hypothetical dilemmas. 
Each item requires the subject to choose between two 
reasons for pursuing a course of action, where the paired 
statements have been arranged to correspond to pro- 
totypic answers for the first five of Kohlberg’s mor: 


for stealing the drug that reflect a prototypic Stage 1 


sponse (“Life is worth 
druggist's profit”). Another item requires a choice 
between a prototypic Stage 1 response and prototypic 
Stage 4 response, each suggesting that Heinz should not 
steal the drug: “Stealing brings punishment from God 


and from society” versus “Heinz should not take the. 


matter into his own hands, but should refer it to the 


opment. 

The Ethical Reasoning Scale provides five subscale 
scores corresponding to the first five of Kohlberg’s moral 
development stages. The reliability of each subscale, 
using Kuder-Richardson 20, is given as follows: For 
Stage 1, r = .63; for Stage 2, r = 47; for Stage 3, r = 41; 
tor SUN d = .47; and for Stage 5, r = 63, Overall 
reliability of the instrument as determined by Kuder- 
Richardson 20 is .72 (N = 479). Average item difficulty 
for these prototypic statements was found to be .65 (n 
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= 80), implying a maximum expected reliability of 73, —. 
When the median value for subscale reliability (,47) jg 
used as an indicator of construct validity, then it can be 
noted that such an estimate is close to the maximum 
possible value for r? (.52). 

Scores obtained on the Ethical Reasoning Scale in- 
dicated that 7.296 of the 214 subjects in the study 
functioned at a moral development stage characteristic 
of Stage 1, 3.7% at Stage 2, 27.5% at Stage 3, 31.396 at 
Stage 4, and 30.3% at Stage 5 or higher. 

Immediately following the completion of the Ethical 
Reasoning Scale, all subjects completed the Preferential 
Reasoning Profile (Ahlskog, Note 2). This instrument, 
was designed to test for changes in the deci ision-making 
process during the resolution of hypothetical moral 
dilemmas. It contains 60 stated reasons for pursuing 
or refraining from a course of action, equally divided 
between three of Kohlberg's hypothetical dilemmas; 
Heinz and the drug, the doctor and the wife, and the 
judge and the doctor. For each section of 20 items, 5 
items provide reasons for pursuing a course of action 
(e.g. the judge should punish the doctor); 5 items 
provide reasons for avoiding that course of action; 5 
items provide reasons for pursuing the alternate course 
of action (e.g., let the doctor go free); and 5 items 
provide reasons for avoiding that course. Sample items 
for these four categories are reported, respectively, as 
follows: 


1. The judge is responsible for seeing that doctors 
do not take it upon themselves to kill people when 
they think there is some good reason to do it. 


2. Even if the judge believes that the doctor was 
wrong, it would not benefit anyone to put the doctor 
in jail. 

3. Since the doctor tried to do the right thing, the 


Te Should not sentence him as if he were a crimi- 
nal. | 


4. Our system of justice would be useless if judges 
ignored the law because of their personal views. 


For each of the three dilemmas of the instrument, 
Subjects were asked to make a decision (in yes or no | 
terms) about the central quetion of the particular di- | 
lemma (e.g, “Should the judge punish the doctor?”) and | 
then to weight the items of each section from zero to six, | 
indicating the extent to which they endorsed the item. 
In effect, the instrument. generated three “decisions of 
choice" and three sets of responses to the items grouped 
according to the four categories noted above. By 
Summing the responses to each category across three 
hypothetical dilemmas, cumulative scores for four 
variables were obtained for each subject. These four 
variables may be visualized as a payoff matrix repre- 
senting the weights each subject ascribed to (a) the 
benefits derived from the decision of choice, (b) the 
penalties paid for pursuing the decision of. choice, (c) the 
benefits derived from pursuing the alternate decision, 
and (d) the penalties paid for pursuing the alternate 
decision. 

A pilot study determined that subject scores for each 
of the three dilemmas were correlated with subject 
Scores on the instrument as a whole at levels of r that 
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ranged from .73 to .90. Split-half reliability during this 
pilot study was calculated at .88 (n = 35). The re- 
sponses of the 214 subjects in the subsequent study 
generated item-scale biserial correlations for the 
Preferential Reasoning Profile that ranged from .49 to 
86, with a median value of .73. 


Results and Discussion 


Since between-stage change was tenta- 
tively defined as change in the decision- 
making process in response to hypothetical 
moral dilemmas, it was hypothesized that 
subjects at different stages of moral devel- 
opment would demonstrate variance in their 
usage of the four response categories of the 
Preferential Reasoning Profile. That is, 
subjects at higher stages of moral develop- 
ment should endorse fewer reasons as justi- 
fication for their decision(s) of choice and 
should find almost no endorsable criteria in 
support of the alternate decision(s). While 
these subjects may have understood many if 
not all of the lower stage reasons for action, 
it was expected that they would not endorse 
such reasons as morally defensible, even 
when those reasons happened to support the 
decision of choice. Equal consideration of 
welfare claims would be increasingly re- 
stricted to claims regarding human worth 
and dignity, so that other potential welfare 
criteria (such as personal happiness or gain) 
would be ignored. Subjects at lower stages 
would be expected to endorse a wider variety 
of welfare claims (and penalties), including 
inconsistent claims as general evidence of 
disequilibrium. 

The null hypothesis, therefore, stated that 
scores for the four variables of the payoff 
matrix for subjects grouped according to 
moral development stage would not be dis- 
tinguishable from the matrices of subjects 
grouped at random from the same popula- 
tion, Evidence needed to refute this null 
hypothesis could be used to generate the 
coefficients of a decision-making function 
representing changes in the process of moral 
decision making across moral development 
Stages. 

On the basis of the assumption that the 
decision-making process for a subject at any 
Particular stage of moral development 
should be identical to the decision-making 
Process of all other subjects assessed at that 
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Table 1 
Means and Standard Deviations for Four 
Matrix Variables Across Five Moral 


Development Stages 


Matrix Moral development stage 
variable 1 2 3 4 5 
Decision of choice 
Benefits 
M 48.21 46.46 49.70 50.4 47.71 
SD 8.43 12.85 9.32 9.58 8.66 
Penalties 
M 30.50 31.39 31.66 30.83 30.06 
SD 7.98 11.06 8.41 8.66 7.48 
Alternate decision 
Benefits 
M 30.07 32.62 29,05 31.02 2942 
SD 891 1189 975 11.26 798 
Penalties 
M 46.79 45.85 47.77 48.53 46.96 
SD 710 9.73 9,78 920 920 


LSD (09 99 ———Ó——— 


moral development stage, a multiple discri- 
minant analysis was chosen in order to 
maximize the ratio of between and within 
sums of squares. The multiple discriminant 
analysis failed to find a linear combination 
of variables to distinguish the five groups, as 
is further shown by an obtained lambda of 
.93 and an approximate F of .95 (p < .51). 
The results of this analysis indicate that it 
was not possible to distinguish moral de- 
velopment stages on the basis of optimum 
differential weighting of the variables of the 
payoff matrix. Mean values for the matrices 
of subjects grouped according to moral de- 
velopment stage are reported in Table 1. 

The inability to identify significant dif- 
ferences between the variables of the payoff 
matrix for subjects grouped according to 
moral development stage precluded any 
further investigation into the magnitude or 
direction of change in the decision-making 
process across five stages of moral develop- 
ment. Insummation, the attempt to verify 
that between-stage change reflects change 
in the process of adjudicating moral claims 
was entirely unsuccessful, and one surmises 
that some new attempt to clarify what de- 
velops across the stages is required. 

One wonders, of course, whether or not 
there is anything remarkable about a study 
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that only requires subjects to “think what 
they think” in a consistent way. Closer in- 
spection, however, indicates that the moral 
adequacy of a person’s judgment must now 
be differentiated from the rational adequacy 
of that same judgment. Moral development 
may well refer to developing knowledge, 
developing language, or the acquisition of 
new values. It apparently does not refer, 
however, to a developing sense of justice as 
rule reciprocity or the developing ability to 
state a rational case for human welfare. A 
similar sense of fairness seems to have in- 
formed the decision-making process of all 
subjects in the study, leaving a more precise 
understanding of specifically moral growth 
in doubt. Without a clearer understanding 
of between-stage change, it is difficult to 
assess the import of stage scores or their 
subsequent modification during the devel- 
opmental and educational processes. 
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In Trotman’s (1977) study of the rela- 
tionships between maternal responses to 
interview questions concerning various pa- 
rental behavior patterns in the home as re- 
lated to the academic achievement and the 
intelligence of children, a considerable ser- 
vice has been done in attempting to replicate 
the original results of Wolf (1966) and to 
extend them to a black sample. Wolf found 
a correlation of .69 between ratings of ma- 
ternal responses and IQ scores, à much 
higher correlation than that obtained be- 
tween socioeconomic status (SES) and IQ. 
‘An even higher correlation of .80 was ob- 
tained between the maternal response rat- 
ings and achievement test scores, leading 
Wolf (1966) to the somewhat dramatic con- 
Jusion that “A measure of what parents do 
the home can be used to predict school 
chievement with a fairly high degree of ac- 
curacy” (p. 498). This conclusion was of- 
ered as a contrast to another presumed 
index of parental behavior, that of SES, 
Which correlates only about .40 with chil- 
"dren's IQ scores. 
| Trotman (1977) found similarly high 
Correlations between maternal response 
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A Comment on “Race, IQ, and the Middle Class” 
by Trotman: Rampant False Conclusions 


Langdon E. Longstreth 


University of Southern California 


In a recent article, Trotman obtained parental responses to a child-rearing in- 
terview in black and white samples and correlated the responses with the 
achievement and intelligence indexes of the children. The correlations were 
quite high, making it clear that adjustment for racial differences in parental 
responses would all but remove racial differences in achievement and intelli- 
gence of the children. Trotman makes several erroneous conclusions on the 
basis of this result, and my comment points out these conclusions. In particu- 
lar, I show how Trotman’s study says nothing important about the nature- 
nurture issue as it applies to racial differences in intelligence. 


ratings and achievement test scores: .76 for 
a black sample and .70 for a white sample. It 
is clear that if IQ and achievement scores 
were adjusted for racial differences in the 
maternal response ratings, there would be 
very little difference left. This is an im- 
portant finding, since other studies have 
generally been unsuccessful in completely 
accounting for such intellectual differences 
solely in terms of home conditions. 

‘At the same time, Trotman discusses these 
findings in a way that is seriously inaccurate 
and misleading. The following four con- 
clusions were drawn by Trotman: (a) The 
maternal response ratings are valid indica- 
tors of parental behavior in the home. (b) 
The correlations between maternal response 
ratings and IQ and achievement test scores 
indicate an effect of the former variable on 
the latter two variables. (c) One of the 
“basic assumptions” of “hereditarian re- 
searchers” (Jensen being the only person so 
labeled) is that “traditional SES indicators 
adequately reflect the effects of environ- 
mental influences on IQ" (Trotman, 1977, p. 
271) (d) Therefore, the present results 


1 Throughout Trotman's (1977) article various subtle 
and not-so-subtle ad hominem remarks appear, directed 
mainly at Arthur Jensen. "Thus, he is called a “here- 
ditarian" who believes in the "innate genetic inferiority" 
(Trotman, 1977, p. 266) of the black races. Such tactics 
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“are antithetical to their [the hereditarians’] 
position” (Trotman, 1977, p. 271). None of 
these conclusions is necessarily true, and the 
following discussion shows why. 

The index of parental home behavior used 
by Trotman (1977) consists of ratings of 
maternal responses to 63 interview questions 
devised by Wolf concerned with what par- 
ents do with their children in the home sit- 
uation. We shall ignore the fact that the 
reliability of these ratings, or even the 
identity of who made them (the interviewer, 
who had access to much other information?), 
is not specified. Of central concern is 
Trotman’s assertion that the obtained rat- 
ings measure “important environmental 
variables,” an assertion in keeping with 
Wolf’s previous claim. But in neither study 
is any evidence presented to support this 
claim. No content validation information 
of any sort is presented that pertains to the 

presumed relationship between ratings of 
maternal verbalizations and actual parental 
behavior in the home. Certainly something 
is associated with these ratings, or else they 
would not be related to anything else in a 
systematic fashion, But parental behavior 
is not the only possibility. It is likely that 
maternal intelligence is another associated 
variable and, as a result of assortative mat- 
ing, paternal intelligence as well. In fact, the 
known correlation of about .50 between 
parent-child IQ scores, when coupled with 
the correlation of .63 between maternal re- 
sponse ratings and child IQ as found in 
Trotman’s study, practically guarantees such 
a relationship. 

Thus, it is probably the case that Trot- 
man’s major finding—the virtual elimination 
of racial IQ differences by adjusting for racial 


in learned journals published by the Ameri = 
chological Association are to be Det PSP te 
record, I have read Jensen extensively and nowhere, to 
my knowledge, does he speak of any group as being in 
nately inferior to any other group. Noris he a heredi- 
tarian. For example, the last sentence of one of his most 
recent publications reads as follows; “Thus it appears 
that a cumulative deficit due to Poor environment has 
con tributed, at least in part, to the relatively low average 
IQ in the present sample of blacks in rural Georgia” 
(Jensen, 1977, p. 191). Jensen is clearly able to coun- 
tenance whatever the data indicate, an objectivity that 


is missing in the case of many other parties to this issue, 
Trotman not excepted. : 


home environment differences (e.g., mater- 
nal response ratings)—is partly due to the 
statistical control of parental intellectual 
differences between the two groups. Ifthis 
is so, then Trotman’s results are not relevant 
to the nature-nurture issue at all, since the 
parental IQ difference between the races 
could be either environmental or genetic or 
both in origin, which is the very poiunder 
dispute. 

The second conclusion, that the home 
environment exerted an influence on 
achievement and IQ characteristics, thereby 
accounting for the obtained correlations 
between these same variables, is, of cours 
no more demanded by the data than the 
opposite conclusion. Indeed, is it not even 
more likely that parents, discovering they 
have a bright child in the family, react to that 
information in various ways, such as en- 
couraging good study habits, planning andy! 
preparing for a college education, and so 0) n 
(e.g., by engaging in the kinds of behaviors | 
covered in Wolf's, 1966, maternal interview 
schedule)? Mountains of data documenting 
such child-to-parents effects have been 
published in recent years (e.g., Bell, 1971} 
Bowlby, 1958, 1969; Gewirtz & Boyd, 1976; 
Hurley & Hohn, 1971; Moss, 1967, 1970; 
Schoggen, 1963; Moss & Robson, Note 1): 
How much stronger such effects must be 
when the parents not only observe child be- 
havior but also receive institutionalized re- 


cards and parent-teacher conferences! 
With such unimpeachable evidence of the 
child’s academic potential laid out in explicit 
terms before them, it would be shortsight- 
edness indeed not to conclude that a child- 
to-parent causative chain is set into motion. 
Yet, Trotman gives no consideration what 
Soever to this likelihood. 
The third conclusion, that a “basic 88 
Sumption” of hereditarians is that tradi 
tional SES measures adequately reflect the 
ome environment, is patently false. The 
true state of affairs seems to be as follows 
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them “environmentalists”), with a high SES 
background being presumed ipso facto to 
produce a high IQ. Thus, nonhereditarians 
themselves have assumed traditional SES 
indexes to be valid and adequate, or else they 
would not have bothered to rationalize away 
the relationship with IQ in the first place. 
Those of a more open mind have entertained 
an opposite possibility, arguing that high IQ 
may qualify one for occupations that cannot 
be achieved by those with lower levels of in- 
telligence. Thus, members of both persua- 
sions have accepted SES at face value and 
have attempted to account for its relation- 
ship with other variables on their own terms. 
The discovery that other measuring tech- 
niques do a more accurate job of reflecting 
home conditions in no way impugns the 
theoretical positions of one side any more 
than it does the positions of the other side. 
These considerations should make it ob- 
vious that Trotman’s fourth conclusion is not 
tenable either, There is no way in which 
Trotman’s findings are antithetical to a po- 
sition that assigns some role to the genes in 
accounting for racial differences in intelli- 
gence. The fact that black-white maternal 
response rating differences account for most 
or all of the black-white child IQ differences 
has absolutely nothing to do with the causes 
for the unadjusted difference. This is nec- 
essarily the case because (a) what the ma- 
ternal response ratings actually reflect is 
unknown, but parental IQ is one likely can- 
didate; (b) the cause-effect direction be- 
tween maternal response ratings and child 
intelligence is also unknown, but a child- 
to-parent direction is as likely a possibility 
as the reverse; and (c) even if the maternal 
response ratings did reflect only home en- 
vironment and even if the cause-effect di- 
rection between this score and child intelli- 
gence was only in the parent-to-child direc- 
tion, it still does not follow that the attenu- 
ation or elimination of racial IQ differences 
by matching for home environment dem- 
onstrates that racial IQ differences are due 
solely to environmental factors. This con- 
clusion simply does not follow. i 
‘An alternative conclusion is the following: 
Genes affect intelligence—both within and 
between races—and intelligence affects what 
people do and say, including the child-rear- 
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ing practices of parents and the academic 
achievement of their children. A relation- 
ship between parental behavior and child 
intelligence is therefore inevitable. It is a 
relationship of association, however, not of 
causation. 

In conclusion, Trotman’s study, while 
yielding interesting correlations between 
maternal verbalizations and child IQ scores, 
does not deal a death blow to “hereditar- 
ians”. What it does do is to emphasize that 
SES is a poor index of parental behavior in 
the home, and that other approaches may be 
better. It remains to be seen if maternal 
interviews fulfill this role. If the frequency 
of maternal interview studies in the child 
development field can be taken as an indi- 
cation of their fruitfulness, this possibility 
is unlikely. 


Reference Note 
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Utility of Socioeconomic Status as a Control 
in Racial Comparisons of IQ 


Joseph L. Wolff 


University of Missouri—Kansas City 


Trotman has presented evidence indicating that black and white families 
roughly equated in terms of socioeconomic status differ widely in terms of 
Wolf's measure of intellectual home environment. On the basis of this evi- 
dence, Trotman has drawn a variety of conclusions about the nonutility of so- 
cioeconomic status as a control in racial comparisons of IQ. In this article, the 
validity of Trotman’s conclusions are called into question by (a) a critique of 
her methodology and (b) a critical examination of her results. 


This article is a critique of a recent arti- of the four ISC items (Warner et al., 1949, 
Je published in the Journal of Educational pp. 143, 153) and in view of the known effects 
Psychology, entitled “Race, IQ, and the of experimenter expectancies on even highly 
"Middle Class” by Frances K. Trotman objective measures (Rosenthal, 1966), it is 
' (1977)! Ishall argue the following: (a) The hardto believe that anyone with Trotman’s 
design of Trotman’s study exhibits at least strong environmentalist leanings could have 
one egregious flaw. (b) Trotman's data produced unbiased ratings on either the in- 
display a variety of unusual features that, at tellectual home environment or ISC mea- 
"the very least, cast grave doubts on the ex- sure? This contention is, in fact, supported 
ternal validity of her study and, quite pos- to some degree by internal evidence in 
— sibly, raise serious questions regarding the Trotman’s reported data. Examination of 
internal validity of her research proce- her Table 5 shows that the two most tangible 
dures. aspects of the home environment (learning 
supplies and books) nu only a amal 

i i white-black differential (i.e. 8 and 17: 
Experimente pias Four of the largest white-black discrepan- 


The major finding of Trotman’s study is 
that samples of black and white families — ! This article is a revised version of a longer critique 


rough! ated on Warner, Meeker, and of Trotman's (1977) article (Wolff, Note 1). For pur- 
z 1y ( 5 poses of revision, the Journal of Educational Psy- 


Eells's (1949) Index of Status Characteristics : i 

$ hol ted me the courtesy of seeing a version of 
(ISC) differed markedly on Wolf’ u (1964) the cronies critique by Langdon E. Longstreth. 
measure of intellectual home environment. ^ ?The following statement by Loehlin, Lindzey, and 
(The white-black differential on ISC is .22 Spuhler (1975) is relevant to this point: 


f ED ue white-black m ul aD We have been concerned privately by the number of 
inte lectu ome environmen LEM instances in which the political and social prefer- 

| units.) ^ ¢ ences of the investigators apparently have grossly 
Trotman’s research design 1s fatally biased their interpretation of data Such distor- 
lfadminis- tions appear to be at least as prevalent at environ- 

E by e ne bare environ- mentalist as at hereditarian extremes. While we 
ered both the In ad hil ur have preferred in this book to report findings di- 
ment measure and the ISC, while no meas E rectly rather than their authors’ interpretations of 
of interrater reliability was reported. Given them, we must face the possibility that the influ- 


the subjective character of many ofthe in- ence of the investigators’ 


tellectual home environment items and two process, and affected the gathering of the data as 


Requests for reprints should be sent to Joseph L. 
Wolff School of Education, University of Missouri, 5100 the nc ion sane 
Rockhill Road, Kansas City, Missouri 64110. considerations. (pp. 232- 
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cies, on the other hand, reside in what are 
perhaps the most difficult aspects of the in- 
tellectual home environment to objectively 
rate: information re development, emphasis 
on language use, parents’ language usage, 
and rewards for accomplishments. White- 
black discrepancies on these items range 
from 1.6 to 2.1 (SD range = 1.2 to 1.6). 
Though by no means conclusive, the pattern 
in these scores suggests experimenter bias. 
Even without this pattern, however, the fact 
that Trotman herself administered and 
scored her two measures renders the results 
she obtained nearly impossible to inter- 
pret. 


Anomalous Results 
Anomaly 1 


Wolf (1966) reported a correlation of .69 
between intellectual home environment and 
IQ for a stratified random sample of fifth- 
grade students selected in such a manner 
that “the number of cases drawn from each 
social class grouping was proportional to the 
number of male adults in each grouping in 
the United States population according to 
Department of Labor data” (p. 496). 
Trotman’s black and white family samples 
both fell within a highly truncated portion 
of this total range of socioeconomic status 
(SES): All fell within the ISC middle class, 
and the majority fell within the ISC inter. 
mediate-middle and lower-middle classes 
(M = 36.5, SD = 3.6; scale Tange = 12 to 84). 
Despite this severe truncation of range, only 
the white sample correlation between intel- 
lectual home environment and IQ showed 
evidence of attenuation (r 7.37). The cor- 
relation for black families was .68, a mere 1 
point lower than Wolf's (1966) .69. This 
figure, though theoretically possible, is 
highly unlikely in view of the following 
facts: 

1. “On the whole the SES-IQ relations 
within U.S. minority groups resemble those 
in the white majority population” (Loehlin 
et al., 1975, p. 177), that is, r = approxi- 
mately .30. 

2. Despite the small SES range of her 
sample, Trotman reports moderate corre- 
lations between ISC and intellectual home 
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environment within both her black and 
white subgroups (for blacks, r = .27; for 
whites, r = .37). This finding suggests that 
the ISC and intellectual home environment 
measures are highly correlated (at least 
within races); hence, restriction of the ISC 
restricts the range of the intellectual home 
environment measure as well. 

3. The internal consistency of the intel- 
lectual home environment measure reported 


by Wolf (1966) for his full-range SES sample - 


was .89, while standardization norms for the 
Otis Group Intelligence Test used by Trot- 


man indicate test-retest reliability of about - 


the same magnitude. 

Given the above facts and the highly re- 
stricted SES range of Trotman’s black 
sample, the correlation of .68 between in- 
tellectual home environment and IQ must 
surely lie fairly close to the attenuated reli- 
abilities of her two measuring instruments. 
Though possible, this finding must be con- 
sidered anomalous in view of Wolf’s (1966) 
reported validity coefficient of .69 and 


Loehlin et al.’s (1975) conclusion that “the 


relation of mental ability to SES-related 
variables is similar for U.S. blacks and 
whites” (p. 169; emphasis mine). The dif- 
ference between the white (.37) and black 
(.68) correlations must also be considered 
highly anomalous in view of Coleman et al.’ 
(1966) finding that “the relative importance 


: 


of educationally related attributes of the - 
home (parents' education, reading matter) — 


compared to indicators of the economic level 
ìs greater for white children than for mi- 
nority group children" (p. 302). This 
anomaly appears even greater when viewed 
against the fact that Tulkin and Newbrough 
(1968, Table 4) found almost identical cor- 
relations between IQ and measures of family 
and cultural participation (similar to many 
items on Wolf's intellectual home environ- 
ment measure) in their high-SES black and 
white family samples and slightly higher 
correlations between these variables in theif 
low-SES white sample than in their low-SES 
black sample. 


Anomaly 2 


Trotman reports a correlation of .25 be- 
tween ISC and IQ for her black family sam- 
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ple. This should be compared with an av- 
erage correlation of about .30 between full- 
range SES and IQ among blacks (Loehlin et 
al., 1975, p. 169). Why was there not greater 
attenuation in Trotman's correlation? This 
mystery appears even greater when it is 
considered that the study excluded children 
in upper-lower-class and lower-lower-class 
households, which are SES levels where 
"environmental threshold" effects (see 
Jensen, 1968, pp. 10-14) are most likely to be 
operating. That is, environmental differ- 
ences at the extremely deprived end of the 
environmental continuum are thought to 
have far greater effects on intelligence than 
do those in the middle or upper ranges; 
hence, it is precisely in this lower range that 
environment-IQ correlations should be 
highest, and studies excluding children 
surrounding this environmental threshold 
should evince markedly lower SES-IQ cor- 
relations (cf. Golden & Bridger, 1969, p. 649; 
Scarr-Salapatek, 1971, p. 1225). 


Anomaly 3 


Trotman reports a correlation of .71 be- 
tween intellectual home environment and 
scores on the Metropolitan Achievement 
Test (MAT) for white children and a corre- 
lation of .76 for blacks. ‘Though the .76 for 
blacks is strange, I shall confine my com- 
ments to the white correlation. Wolf (1966) 
reports a correlation of .80 between his 
measure of “the environment for the devel- 
opment of academic achievement” (p. 497) 
and MAT for his full-range SES sample of 
midwestern children. Since this measure, 
unlike the intellectual home environment, 
was designed especially to predict academic 
achievement (rather than IQ) and has, 
moreover, a higher reliability than the in- 
tellectual home environment measure (.95 
vs. .89), we can assume that the unreported 
intellectual home environment — MAT cor- 


smaller than the .80 repo 

MAT and his environmental 
development of academic achievement. At 
best, the former correlation should have been 
in the low .70s. Why, then, with a severely 
restricted sample of white families (mid- 


dle-class families only) does Trotman obtain 
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a correlation of .71? Why was there no at- 
tenuation? The finding can only be con- 
sidered anomalous. 


Anomaly 4 


Trotman reports a correlation of .83 be- 
tween intellectual home environment and 
grade point average for her white sample. 
This is an extremely peculiar finding in view 
of the known unreliabilities of teacher marks 
and the normal .50 relationship between 
traditional predictors and grade point av- 
erage (Astin, 1971, p. 6; Lavin, 1965, p. 51).? 
It is unlikely that the reliability of junior 
high school grade point average itself is much 
higher than .83 (Lavin, 1965, pp. 19-20); and 
within the restricted IQ and SES ranges in- 
vestigated by Trotman, normal reliability 
(whatever it may be) must surely be greatly 
attenuated. How can a test whose own 
full-range SES reliability is only .89 and 
whose correlation with the MAT in Trot- 
man’s sample is merely .71 do so well as a 
predictor of the protean grade point aver- 
age? Again, the reported data must be 


judged anomalous.* 


3 These correlations are based principally on data 
taken at the college level. Examination of "Trotman's 
Figure 1 indicates, however, that the IQ variance of her 
white sample is no greater than is found in college 
populations. Moreover, since her sample of children 
is more economically homogeneous than are college 
populations, there is no reason to expect a higher pre- 
dictive relationship in her study. 


confidence limits for the coefficients Trotman reports. 
Beginning with Anomaly 1 and proceeding through 
Anomaly 4, these are as follows: for .68, .50 to 81; for 
.25, —.03 to .49; for .11, .54 to 83; and for .83, .72 to .90. 
Viewed against these confidence limits, one might 
hazard the opinion that ‘Anomaly 2 is slight, Anomaly 
3 is considerably more serious, and Anomaly 4 is grave 
indeed. For example, in regard to Anomaly 4, the 
normal .50 relationship between traditional predictors 
and grade point average falls well below the lower bound 
(.72) of the 95% interval for the reported correlation 
coefficient of .83. c 
Anomaly 1 might be considered moderately serious 
if Wolf's (1966) validity coefficient of 69 is taken at face 
value. ‘There is reason to believe, however, that Wolf's 
figure of .69 is itself an overestimate of the degree of 
relationship between home environmental variables and 
IQ. Thus, in studies similar to that of Wolf, Burks 


476 
Anomaly 5 


Trotman reports a 1.03 SD unit difference 
on Wolf's intellectual home environment 
measure between her black and white family 
samples that favors the latter. This result 
is in conflict with Bloom, Whiteman, and 
Deutsch’s (Note 2) finding that relationships 
between social class and family environ- 
mental conditions are very similar among 
American blacks and whites, but that “social 
class may be a more potent variable than 
race in predicting to environmental and at- 
titudinal factors” (p. 10). It is also highly 
discordant with Tulkin's (1968) finding that 
measures of family participation and cul- 
tural participation (similar to many items on 
Wolf's intellectual home environment) show 
no difference between upper-SES whites and. 
blacks and show significant (p < .01) black 
superiority over whites among lower-SES 

families on the family participation scale and 
nonsignificant black superiority on the cul- 
tural participation scale. 


Anomaly 6 


Trotman reports an approximately 1 SD 
unit difference between white and black 
mothers’ intellectual aspirations for their 
children, white mothers expressing higher 
aspirations than black mothers. This find- 
ing contradicts those of previous investiga- 
tors, who have found that black parents have 
higher academic aspirations for their chil- 
dren than do white parents of the same SES 
(see, e.g., Baratz & Baratz, 1970, p. 38; 
Coleman et al., 1966, p. 302; Katz, 1968, p. 62; 
Kirkpatrick, 1973, p. 356). Since intellec- 
tual aspirations and intellectual expectations 
form 2 of the 13 variables making up the in- 
tellectual home environment, this discrep- 
ancy is not insignificant. 


(1928) obtained correlations of 42 (Whittier index) and 
-44 (culture index) between IQ and ratings of the home 
environment, and Leahy (1935) found correlations of 
:58 (environmental status of home) and .51 (cultural 
status of home) between similar Pairs of indices, 
Considered in the light of these values, Trotman’s cor- 
relation coefficient of .68 appears far too high. Indeed, 
were it possible to correct it for attenuation due to Te- 
striction of SES, it would probably be twice the size of 


Eu correlations and over half again as high as Le- 
y's! 
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Conclusions 


If we accept at face value the various 
anomalies in Trotman’s data just discussed, 
we are obliged to conclude that her sample 
is highly peculiar, and therefore, findings 
based on this sample lack external validity, 
If the sample is considered representative, 
on the other hand, serious doubts about the 
internal validity of her study are immedi- 
ately raised by the several anomalies just 
considered. 
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Race, IQ, and Rampant Misrepresentations: A Reply 


Frances K. Trotman 
Institute for Counseling and Psychotherapy 
Fort Lee, New Jersey 


In Longstreth's recent critique to Trotman's earlier article, he attributes con- 
clusions and interpretations to Trotman that indicate either a misunderstand- 
ing or misinterpretation on Longstreth's part or a failure in effective commu- 
nication on Trotman's part. In any case, Trotman's recent response repre- 
sents an attempt to clarify Longstreth's apparently cloudy view of the earlier 
Trotman work. Wolff has suggested that experimenter bias operated to pro- 
duce anomalies in Trotman's findings that despite similarity in socioeconomic 
standing, there was a difference in the intellectual home environments of 
blacks and whites, that IQ was related to home environment, and that home 
environment did as well as academic achievement in predicting IQ. On exam- 
ination of Wolff's statements and reference citations, however, Trotman finds 
evidence of bias and irregularities in his critique. For example, Wolff miscon- 
strues data, selectively either omits or includes reference citations, or state- 


ments out of context, and misrepresents previous investigations, suggesting a 


possibility of commentator bias. 


Realizing that some readers may rely on 
their recollection of my article (Trotman, 
1977) and its misinterpretation by Long- 
streth (1978), I will now attempt to present 
a clarification of the four so-called "rampant 
false conclusions" that I am reported to have 
"drawn" (Longstreth, 1978, p. 469) as well as 
of the "various subtle and not-so-subtle ad 
hominem remarks" (Longstreth, 1978, 
Footnote 1) that I am supposed to have 
made. I will then turn to the subject of sci- 
entific bias that was raised by Wolff (1978) 
in his critique. 


A Clarification for Longstreth 


The first conclusion attributed to me by 
Longstreth (1978, p. 469), that is, *maternal 
response ratings are valid indicators of pa- 
rental behavior in the home,” is not a con- 
clusion at all but rather an assumption—the 
same type of assumption that most re- 
searchers who rely on verbal responses 
make—that the respondent is telling the 
truth. It is, of course, possible that a sam- 
pled mother may have lied, for example, 
about having an encyclopedia in the home, 


Requests for reprints should be sent to Frances K. 
Trotman, 216 Ivy Lane, Teaneck, New Jersey 07666. : 


about the kind of encyclopedia, when it was 
purchased, for what reason, if the accompa- 
nying yearbooks were purchased yearly, and 
soon. The fact that the trained interviewer 
also made careful observations perhaps was 
no safeguard, since the mother may have 
imported a neighbor’s set of encyclopedias 
just for the interview. I do not deny that 
the possibility of parental falsification is a 
risk taken in this type of investigation. 
However, the verity of maternal response 
was, I believe, a reasonable and necessary 
assumption that did not distort my find- 
ings. 

Longstreth (1978) considers it “likely” (p. 
470) that within a socioeconomically homo- 
geneous middle-class sample, such variables 
as maternal intelligence, assortative mating, 
and paternal intelligence accounted for the 
Observed environmental and attitudinal 
differences between blacks and whites as 
well as for the finding that home environ- 
ment predicted IQ at least as well as aca- 
demic achievement. Though I find such an 
explanation to be rather circuitous, I would 
never conclude that it was not a possible— 


* As specified in my article (see Trotman, 1977, P: 
267), the interviews were recorded, scored, and ra 
before viewing school records. 
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however farfetched—interpretation of the 
| reported correlations (see Trotman, 1977). 

I found and reported important relation- 
| ships among the variables of race, IQ, aca- 
| demic achievement, and home environment 
| within a middle-class population. 


The results cannot provide definitive or conclusive ev- 
idence in support of any of the major interpretations of 
the IQ difference between the races; the results are, 
however, pertinent to any attempt to disentangle the 
Í variables involved in the IQ controversy. (Trotman, 
1977, p. 271) 


|| 
| 


Accordingly, I hesitate to even comment 
on my alleged second conclusion, that is, the 
allegation that I concluded that “the corre- 
lations between maternal response ratings 
and IQ and achievement test scores indicate 
an effect of the former variable on the latter 
two variables” (Longstreth, 1978 p. 469). 
| We certainly all know by now that correla- 
tion does not equal causation. I do have a 
perspective from which I view the world, life, 
and social scientific data. And from my 
| perspective, having observed correlation 
data, I might pose ideas to the reader— 
suggest, postulate, theorize, or speculate— 
but certainly never conclude. 

The third “rampant false conclusion” at- 
tributed to me, that is, that “one of the ‘basic 
assumptions’ of *hereditarian researchers’ . . - 
is that ‘traditional SES [socioeconomic sta- 
tus] indicators adequately reflect the effects 
of environmental influences on IQ” 
(Longstreth, 1978, p. 469), is also not a con- 
clusion but rather an allusion to my state- 
ment of what I see as the implied assumption 
of theories that suggest a genetic basis for the 
IQ difference between the races; that is, 


given a one-standard deviation difference in the intel- 
ligence test scores of black and white Americans ... and 
finding that this discrepancy between the races remains 
even within the same socioeconomic stratum . . - some 
investigators have suggested the “reasonable and likely 
hypothesis that genetic differences are involved in the 
Negro-white IQ difference" (Jensen, 1972, p. 421). 


(Trotman, 1977, p. 266)? 


As Longstreth (1978) reminds us, most social 
scientists have «accepted SES at face value" 
(p. 471). My concern, however, lies with 
those who take the discrepant IQs of blacks 
and whites within the same socioeconomic 
Stratum and, under the assumption that 
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similarity in SES assures sufficient control 
for relevant environmental influences, infer 
or suggest a genetic basis for the discrep- 
ancy. 

This brings us to the conclusion that I 
have drawn from my 1977 study: 


By questioning one of the basic assumptions of the 
hereditarian researchers and commentators (i.e., that 
traditional SES indicators adequately reflect the effects 
of environmental influences on IQ), the results are an- 
tithetic to their position. Conclusions about a genetic 
inferiority of blacks, based on the use of traditional SES 
indicators as measures of the intellectual quality of the 
environment, are untenable against the present find- 
ings. (Trotman, 1977, pp. 271-272) 

The results of this study do clearly demonstrate that 
traditional indices of SES—such as income, occupation, 
and living conditions—represent insufficient assess- 
ments of important environmental variables related to 
intelligence test results. Future researchers who wish 
to determine the relationship of the environment to 
intellectual performance must measure or control not 
only the gross indicators of class status but also the 
underlying process variables known to be related to IQ 
and academic achievement. (Trotman, 1977, p. 272) 


On the Subject of Scientific Bias: Wolff's 
Critique 


It puzzled me why Wolff (1978) made so 
much of the concept of investigator/com- 
mentator bias in his criticism of my work. It 
was certainly clear in my article that the 
measurement data on IQ and grade point 
average were not extracted from school files 
until after the interviewing had been com- 
pleted and the ratings recorded, so that ex- 
perimenter bias could have affected nothing 
other than the difference in intellectual 
home environment means for the two races, 
a difference to which Wolff devoted rela- 
tively little criticism. Surely, if experi- 
menter bias had been at work to distort the 
intellectual home environment results, the 
home environment’s relationship to the 
other variables as represented by the corre- 
lation coefficients would have been smaller 
and perhaps more pleasing to Wolff. 

The concept of scientific bias is certainly 
a useful and important one, particularly in 
view of the passions raised by the old na- 


2] included the reference to Arthur Jensen in my 1977 


article so that readers might judge for themselves the 
must be “deplored” 


extent to which my “tactics” 
(Longstreth, 1978, Footnote 1). 
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ture-nurture or the recent race-IQ contro- 
versies (see Loehlin, Lindzey, & Spuhler, 
1975, pp. 7-8). The notion of commentator 
bias was indeed quite helpful to me in my 
attempt to understand Wolff's comments on 
my article. There are, of course, researchers, 
statisticians, and commentators of different 
dispositions or persuasions who would find 
the criticized data (Trotman, 1977) to be 
quite reasonable and representative. 

The six so-called anomalies listed by Wolff 
are basically his opinions and interpreta- 
tions; the disputed data are not anomalous 
at all. Wolff's Footnote 4 is probably his 
least distorted and most straightforward 
summary statement of his critique. On an 
examination of that footnote, we find that 

his computed 9596 confidence limits for the 
allegedly anomalous correlation coefficients 
in fact embrace most reasonable comparison 
coefficients? Quotes and studies cited by 
Wolff to support his disapproval of my data 
are, on reexamination, actually compatible 
with my findings. To support his opinion 
concerning so-called Anomaly 1, for exam- 
ple, Wolff uses a citation by Loehlin et al. 
(1975, p. 169). On the same page as that 
cited by Wolff, Loehlin et al. (who, by the 
way, mention only one study of actual 
SES-IQ comparisons, that of Duncan, 1968) 
state that in the only study for which they 
report data on females, “for women, the ed- 
ucation-income correlation was somewhat 
higher for blacks: .50 versus -83" (Loehlin 
et al., 1975, p. 169).4 Continuing in the same 
paragraph, Wolff's (1978, p. 474) use of a 
quote by Coleman et al. ( 1966, p. 302) re- 
ferred, in fact, to achievement rather than to 
IQ as implied by Wolff; the citation therefore 
has only minor relevance to Wolff's dispu- 
tation of the intellectual home environ- 
ment-IQ correlations. Also in the same 
paragraph, Wolff (1978, p. 474) refers to the 
data of an investigation (Tulkin & New- 
brough, 1968, Table 4) from which middle- 
SES subjects were excluded—a crucial fact 
that Wolff neglects to point out. 

The supposedly opposing studies (i.e., 
Baratz & Baratz, 1970; Coleman et al, 1966; 
Katz, 1968; Kirkpatrick, 1973; Tulkin, 1968; 
Bloom, Whiteman, & Deutsch, Note 1) cited 
by Wolff to support his alleged Anomalies 5 
and 6 are also reconcilable with the findings 
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of my research. In addition to differences 
such as having excluded the middle class, the 
investigations cited by Wolff did not use 
instruments comparable to intellectual home 
environment to measure home environment 
variables. A major feature of the intellectual 
home environment measure is its use of rel- 
atively tangible criteria. For example, it 
elicits and rates the extent of the parents’ 
financial preparation for and active explo- 
ration of higher education in addition to the | 
parents' thoughts, feelings, or beliefs in order 
to determine the intellectual aspirations that 
they have for their children. In this respect, 
the intellectual home environment is sensi- 
tive to what a parent does as well as to how 
heorshefeels. Itis conceivable that parents 
might, for example, express a strong desire 
for their children to attend college but have 
made no tangible preparations for that 
event. A traditional assessment of the 
parents' attitudes would probably indicate 
high intellectual aspirations, whereas the 
intellectual home environment rating of the 
variable would be considerably lower. In- 
deed, we can turn for enlightenment to a 
study (Katz, 1968, p. 63) on the subject, 
which is cited by Wolff (1978, p. 476). Fol- 
lowing the bit of information (Katz, 1968, p. 
62) that was isolated by Wolff, Katz goes on 
to support my belief that it is not an anomaly 
in my findings, but rather the relative sen- 
sitivity of the intellectual home environment 
that accounts for any discrepancy between 


my findings and those of other research- 
ers: 


"These aspirations are so discrepant with the amount of 
effort lower-class parents actually devote to their chil- 
dren’s educational needs (for example, helping with 
homework), and so unrealistic. . . as to suggest that they 


es 

? Although all of the criteria used by Wolff are 
Somewhat debatable, the 72 to 90 limits in so-called 
Anomaly 4 have no truly comparable criterion coeffi- 
cient against which to be judged; Wolff's .50 correlation 
represents the relationship between “traditional pre- 
dictors" and grade point average, not intellectual home 
environment and grade point average. 

* In view of the possible differences between males 
and females in this area (see, e.g., Bradley, Caldwell, & 
Elardo, 1977; Elardo, Bradley, & Caldwell, 1977; Moss, 
1967; Will, Self, & Datan, 1976) and considering that the 
population of my study consisted of only females, ! 
think that this is an important point and a significant 
oversight by Wolff. 


are merely empty statements made for the benefit of the 
interviewer, or expressions of fantasies that have 
othing to do with the real events. (Katz, 1968, 


p.63) 

On the same page of another of Wolff's 
tations (i.e., Coleman et al., 1966, p. 302), 
he reader finds possible additional clarifi- 
‘cation of the apparent discrepancy and fur- 
ther support for my contention: 


( 


"Ihe parents of . .. minority group children are less able 
^o translate their interest into effective support for the 
child's learning than are white or Oriental American 
ents. (Coleman et al., 1966, p. 322) 


"Similarly, other (if not Wolff's) citations 
tan be used throughout Wolff's critique as 

ipport for rather than as refutation of my 
data. In fact, a very recent study by Bradley 
etal. (1977) suggests that an 


‘environmental process measure predicts IQ as well as 
'a combination of process and status measures, whereas 
ere is loss in predictive power when SES is used by 
itself (especially in the case of blacks). The process 
‘measure appears to be a more accurate index of envi- 
— quality across groups than does SES. (p. 


The authors continue, 


The correlations observed for females were exceedingly 
high (R = .825). An examination of simple and partial 
t correlation coefficients among HOME subscales and IQ 
reveals that mental test performance for females is also 
related to a wider variety of environmental inputs than 
for males. (Bradley et al., 1977, p. 700) 


Clearly, one’s biases may be at work to 
influence one’s reading and/or memory of 
"past and present investigations. Surely 
Wolff’s concerns about experimenter bias 
can be broadened to encompass a concern for 
the biases of those who read, interpret, and 
critique empirical findings. 


Reference Note 


M., & Deutsch, M. Race and 
ctors related to social en- 
ted at the meeting of the 


1. Bloom, R., Whiteman, 
social class as separate fa 
vironment. Paper presen 
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American Psychological Association, Philadelphia, 
September 1963. (Cited in Tulkin, S. R. Race, class, 
family, and school achievement. Journal of Per- 
sonality and Social Psychology, 1968, 9, 31-37.) 
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Sex Differences in Locus of Control and Performance 


Under Competitive and Cooperative Conditions 


Stephen Nowicki, Jr., Marshall P. Duke, 
and Mary Pat Duncan Crouch 
Emory University 


For the purpose of testing the possibility that certain factors may mediate the 
locus of control of reinforcement/achievement relation, college students were 
asked to compete against or cooperate with same- or opposite-sex partners. 
"The hypothesis that the achievement behavior of internally controlled females 
would be more affected by sex of partner and type of competition than that of 
males was confirmed and replicated in a second study. Although internally 
controlled males increased their achievement more than did externally con- 
internally controlled females increased their performance when 
competing against males or cooperating with females. These results are inter- 
g that females may require more complex models to describe 
general and the locus of control/achievement relation in par- 


trolled males, 


preted as meanin; 
their behavior in 


ticular, 


Rotter’s social learnin; 


Chance, & P 


external control. 


is contingent upon his 
tively permanent characteristics, 


4 belief in internal control, (p. 1) 


From such a definition it follows that an Bede ee adievement for midleg 


internal as opposed to 
control should b 


achievement 


fort, greater achie 


hares, 


. Ifthe 


effort 


(1976) stated, 


the link between 


appeals to common sen: 
sense suggests that a di 
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g theory (Rotter, tween one’s efforts and outcomes should preclude 
1972) was the source for achievement striving. Without an expectation N 

N ternal control, persistence despite imminent failure, the 
mee CR nee Postponement of immediate pleasures, and the orga- 


most popular present day research variable 
(e.g., Carlson, 1975), 
defined by Rotter (1 


When a reinforcement is 
following some action of hi 
contingent upon his action, 
typically perceived as the res 
under the control of powerful others, 
use of the great complexity 
rounding him. When the event is 
way by an individual, we have lal 


locus of control was essential to any prolonged achievement effort, will occur. 
966) as follows: 


perceived by the subject as — is, individuals must entertain some hope that theif 


own but not being entirely efforts can be effective before one can expect them to 
then, in our culture it is 


luck, chance, fate,as ^ ment. (pp. 66-67) 
or as unpredictable 

of the forces sur- d f 
interpreted in this However, the locus of control/achievement 


person eire a pia in relation has not always received consistent 
ives = 
own behavior or his Pee ici empirical support; for example, although: 


we have termed this tere is some evidence for a relation between 


an external locus of Nowicki j 
e related to greater owicki, Note 1). The lack of theoretically 


and, because of this ef- 
As Lefcourt amine the conceptual reasoning underlying 
control and cognitive activity € Pecİally in the case of females. As Heil- 


se. In like fashion, common 
isbelief in the contingency be- 


whil i ior i to 
be sent to Stephen le achievement behavior in females may succumb 
sychology, Emory Uni- 


only among individuals who believe that they can 
through their own efforts, accomplish desired goals; that 


make the sacrifices that are prerequisite for achieven — 


an internal locus of control orientation and 
findings are more ambiguous for females (see 


consistent results in academic achievement 
Suggests that there may be a need to reex- 


the achievement/locus of control relation, 


run, Piccola, and Kleemeier (1975) con- 
cluded, 


simplistic explanation the lesson of research into fé 
sex-role behavior is that it is more rather than less 
complex than that of males. Thus, the time is still 
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SEX DIFFERENCES 


for broad multivariable studies of female achievement 
behavior which provide the opportunity to detect 
combinational effects. Once the matrix or matrices of 
achievement-relevant factors have been identified, the 
role of any specific variable can be fruitfully explored. 


(p. 2) 

In the present study, the authors exam- 
ined the possibility that the locus of con- 
trol/achievement relation may be mediated 
by certain personal and situational charac- 
teristics. Two prime candidates for medi- 
ators of the locus of control/achievement 
relation were sex of the interactors and the 
cooperative versus competitive nature of the 
achievement task. There is some reason to 
believe, for instance, that the achievement 
performance of some women may suffer 
when they compete against men. Heilbrun, 
Kleemeier, and Piccola (1974) found this to 
be the case. Likewise, Horner (1972) pos- 
tulated that some women showed what could 
best be described as a “fear of success” when 
competing against men. Other work gen- 
erally suggests that males are more respon- 
sive to competition and females are more 
responsive to cooperation in terms of 
exec performance (Senior & Brophy, 

73). 

For the investigation of the effects of the 
degree of competition with the sex of the 
subject and the interactors on achievement 
behavior, male and female subjects differing 
in locus of control orientation competed 
against or cooperated with same- or oppo- 
site-sex partners. On the basis of social 
learning theory, it was predicted that inter- 
nal males compared with externals should 
respond to an achievement situation with 
increased performance. However, we be- 
lieve that the internal females will be more 
sensitive to the changes in situational de- 
mands than the internal males. We are as- 
suming that with whom they are competing 
and how matter to the internal females. 
Although females generally seem to work 
better under cooperative than competitive 
conditions, the internal female is somewhat 
different, since her internal orientation is not 
consistent with the traditional female sex 
role of passivity. We believe that such an 
inconsistency may lead an internal female to 

. have special incentive to achieve against 
|! males. Therefore, it was predicted that in- 


p, ternal females will probably increase their 


| 
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achievement behavior most when competing 
with males compared with other groups and 
in other situations. 


Experiment 1 
Method 


Subjects. The subjects were 60 volunteers taken 
from two social sororities and 20 females and 80 males 
chosen from the subject pool associated with the in- 
troductory psychology course. The subjects were 
comparable in age and were predominantly members 
of the middle and upper-middle classes. The median 
age of the subjects was 19. All subjects participating 
in the study were white. 

Materials. Locus of control orientation was mea- 
sured by the Adult Nowicki-Strickland Internal-Ex- 
ternal control scale (ANS-IE). Satisfactory reliability 
and validity for the use of this scale with this population 
are reported elsewhere (Nowicki & Duke, 1974). 

The main performance task chosen for the present 
study was a digit-symbol task similar to that found in 
the Wechsler Adult Intelligence Scale (1955). It con- 
sisted of a key in which the digits were associated with 
separate symbols. The subjects were presented with 
100 randomly ordered digits, with empty boxes beneath 
each digit. The subject's task was to match the ap- 
propriate symbols with the digits. He or she had 90 sec 
to complete as many of the symbols as possible. 

Experimenters were a white female senior college 
student and a white male senior college student. 

Procedure. In the first session the subjects were 
tested in large groups (n = 20-57) and were given the 
following instructions: 


In this experiment I'm testing the way in which cer- 
tain personality traits are related to each other. I 
will give you two tests; for one you'll have as much 
time as you need, and for the other you'll have only 
ninety seconds. You may be called back to retake 
part of this experiment in several weeks, It is very 
important that you come back if called, so I hope 
you'll make every effort to cooperate. Do you have 
any questions? If not you may begin. 


After the initial questionnaire had been completed, 
subjects were given the ANS-IE and the digit-symbol 
test. About half of the subjects took the digit-symbol 
test first; the rest took the ANS-IE first, Because the 
testing was done in large groups, this order could not be 
completely randomized. When both tests were com- 
pleted, subjects were thanked and reminded of the 
importance of returning if called. 

On the basis of the ANS-IE scores, subjects were di- 
vided into internal and external groups. The median 
score of 8 was used as the cutoff point, with scores less 
than 8 being considered internal and scores greater than 
or equal to 8 being considered external. 

After an average time period of about 2 weeks, 
subjects were called and asked to come back for a second 

testing. In each internal and external group, subjects 


were randomly put into a competitive or a cooperative 
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condition for the second testing. The degree of com- 
petitiveness was determined by instructional set. 
Subjects were led to believe that their partners were 
subjects like themselves who also had participated in 
the original testing. Subjects were given the following 
instruction in the competitive condition: 


"The two of you got about the same score on the first 
test I gave you. Because I need a separate ranking 
for each of you, I asked you to retake the test to see 
which of you scores better this time. It is impor- 
tant that you do your best, since whoever gets the 
highest score on this test will receive a reward of 
five dollars. 


In the cooperative situation, these instructions were 
given: 


You and your partner today got about the same 
score on the first test I gave you. At that time, you 
were tested and scored individually. This time, 
your score will be added to your partner's score in 
compiling the data. It is important that you do 
your best, since the pair that receives the highest 
combined score will receive a reward of ten dollars 
to be split between the two. 


After the instructions were given, the digit-symbol 
test was readministered with the same time limit as for 
the original testing. With the intent of allowing a larger 
number of subjects to be tested, male or female ac- 
complices of the experimenter played the role of a 
subject's partner in about half of the cases. The ac- 
complices were two white male and two white female 
Emory University undergraduates. None of the 
subjects had met the accomplices previously, 


Results 


For each subject, the difference between 
the scores of the first and second digit-sym- 
bol tests was obtained. The mean difference 
scores are shown in Table 1. The presen- 
tation of experimental stimuli did not show 
an order effect, nor were there significant 
effects as a function of experimenters or ac- 
complices. As a result, the data were col- 

lapsed across these variables for analysis. 
The results were analyzed by a 2 (Internal vs. 
External) X 2 (Male vs. Female Subject) x 
2 (Male vs. Female Partner) x 2 (Coopera- 
tion vs. Competition) analysis of variance. 
Consistent with the basic hypothesis, in- 
ternals increased their performance signifi- 
cantly more than did externals, F(1,143) = 
9.31, p <.01. However, there was a signifi- 
cant triple interaction, F(1, 143) 2 11.12, p 
€ .01; Newman-Keuls procedures indicated 
that the triple interaction came from inter- 
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Table 1 

Mean Differences Between First and Second 
Administrations of a Digit-Symbol Task for 
Internal and External Male and Female 


Subjects Under Varying Testing Conditions 


Internal External 

Partner Female Male Female Male 
Cooperative instructions 

Same sex 9.438 8.66 7.57 6.71 

Opposite sex 243b 7.32 1.715 6900 
Competitive instructions 

Same sex 5.145 7.61 6.00 5.32 

Opposite sex ^ 10.962 8.02 5.29 6.11 


Note, Means sharing a superscript letter are significantly 
different (ps < .05) from those sharing a different superscript 
letter. 


nal females cooperating with a female or 
competing with a male, thus increasing their 
performance significantly more than inter- 
nals cooperating with males or competing 
with females or external females cooperating 
with males. 


Experiment 2 


It is interesting that locus of control ori- 
entation interacted with sex of partner and 
type of competitive activity for females. 
These female subjects performed best when 
competing against males (or cooperating 
with females against other teams). How- 
ever, a possible confounding effect on this 
finding was that many of the female subjects 
were members of the same sorority and may 
have known their female partners. This was 
brought to the attention of the experiment- 
ers by subject comments. Because the fe- 
male subjects may not have known their 
male partners as well as they knew their fe- 
male partners, the findings may not have 
resulted so much from a lack of willingness 
to cooperate with a male stranger but rather 
more from a willingness to cooperate with an 
acquaintance. By the same token, perhaps 
the subjects tried less in the competitive 
conditions to defeat females who may have 
been friends than males who may have been 
strangers, 

The following study was completed to 
assess the possibility that female subjects 
familiarity with other female subjects con- 


| founded the locus of control/achievement 


Subjects. The subjects were 60 females chosen from 
the introductory psychology class at Emory. Their age 
ind socioeconomic levels were comparable with those 
f subjects used in the first study. Since these subjects 
were chosen from the general subject pool, it was as- 


ol 
The subjects participated in the study to obtain course 


credit. 
"Measures. 'The measures used in the first study were 


Used in this study. 

Procedures. ‘The procedures used in the first study 
were followed here. The only exception was that only 
female subjects were used in the present study, which 
External) X 2 (Male vs. 


male Partner) X 2 (Cooperation vs. Competition) 
orial design. Different males were used for the male 


The results were substantially the same 
as those found in the first study. Analysis 
of variance procedures on the mean differ- 
ence scores produced in Table 2 indicated a 
significant locus of control effect, F(1, 57) = 
6.18, p < .01, and a three-way interaction, 
F(1, 53) = 5.11, p < .01. Again, Newman- 
-. Keuls post hoc analysis indicated the source 
of the three-way interaction lay in internal 
females cooperating with a female or com- 
external female 
thus increasing 


peting with a female or external females co- 
operating with males. All results but those 
of the external female groups were similar to 
those found in the first study and seemed to 
suggest that the significant relations were 
not due to familiarity among the females. 


General Discussion 


Female college students did not show the 
- expected theoretical relation between locus 
of control and achievement performance. 
"Whereas internal males increased perfor- 
ce on the digit-symbol task regardless of 
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Table 2 

Mean Differences Between First and Second 
Administrations of a Digit-Symbol Task for 
Internal and External Female Subjects Under 


Varying Testing Conditions 


Partner Internal External. 
Cooperative instructions 
Same sex 8.88^ 9.61" 
3.33 2.61» 


Opposite sex 
Competitive instructions 
4.78 6.59 


Opposite sex 12,10 6.28 


Note. Means sharing à superscript letter are significantly 
tet (ps <.05) from those sharing a different superscript 
letter. 


Same sex 


sex of partner or competitive nature of the 
task, internal females increased or decreased 
their performance on the basis of the sex of 
their partner and the type of competition. 
Though the criterion of a digit-symbol task 
may not be comparable with the criterion of 
academic or professional success, it does not 
suffer as the latter criteria do from the po- 
tential confounding created by unequal 
school admission and hiring procedures 
based on sex. 

These results suggest that the locus of 
control/achievement relation is not a simple 
one for females, Heilbrun, Piccola, and 
Kleemeier's (1975) supposition that female 
achievement behavior may involve more 
complex models seems to have gained some 
support, Competing with a male or female 
makes a difference to an internal female. 
However, we do not know whether this 
relation is specific to college-age females or 
a more general phenomenon of other-age 
females. More research of a developmen- 
tal sort is needed to investigate this possi- 
bility. 

If these results are generalizable, they 
suggest that females' achievement perfor- 
mance may not truly reflect ability. Anin- 
ternally controlled female may show her 

test achievement potential depending 
on whether she is to compete or cooperate 
with a male or a female. In achievement sit- 
uations, knowledge of locus of control ori- 
entation may suggest the most efficient 
learning procedure. In any case, these re- 
sults tend to confirm that the locus of con- 
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trol/academic achievement relation may not 
be a completely straightforward one for fe- 
males. Instead, the type of situation and the 
type of participants in that situation may 
have a significant effect on how an internal 
as opposed to an external female responds. 
Although males responded in a much more 
theoretically consistent manner than did 
females in'the present study, there may be 
those mediating characteristics that affect 
them similarly. 

The review of the state of research in the 
locus of control/academic achievement area 
as well as the reported results of empirical 
study suggests strongly that there are many 
potential complexities that need to be solved 
by research that focuses on mediational 
variables and interactions rather than 
"traits" and main effects. 


Reference Note 


1. Nowicki, S. Predicting academic achievement of 
females from a locus of control orientation: Some 
problems and some solutions. Paper presented at 
the meeting of the American Psychological Associ- 
ation, Montreal, Canada, August 1973. 
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cational settings. 


The growth of the self is a social phe- 
nomenon, arising and developing in social 
contexts (Mead, 1934). Next to the home, 
the school is perhaps the most important 
social force in shaping and maintaining the 
child’s self-concept (Purkey, 1970). Our 
educational philosophy of optimum devel- 
opment of the whole child requires that ed- 
ucators understand the effects of school or- 
ganization upon self-concept as well as upon 
academic achievement. School organization 
affects the child directly by allowing the 
child to associate with certain children and 
not with others during school hours. We 
need to know the effects of specialized 
groupings on children’s perceptions of 
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Social Comparison, Multiple Reference Groups, and the 
Self-Concepts of Academically Handicapped Children 
Before and After Mainstreaming 


Louise Strang, Monte D. Smith, and Carl M. Rogers 
George Peabody College for Teachers 


Predictions adduced from social comparison theory and group reference theo- 
ry were tested in two experiments that assessed the impact of half-day integra- 
tion into the educational mainstream upon the self-concepts of academically 
handicapped children. In the first experiment, mainstreamed children exhib- 
ited significantly augmented self-concepts, a result attributed to the availabil- 
ity of multiple comparative reference groups. 
manipulation designed to restrict self- 
children in the academic mainstream produced decreased self-regard, while 
unrestricted utilization of multiple comparative reference groups produced in- 
creased self-regard. The results were interpreted as supportive of the theo- 
retical viability of social comparison theory and group reference theory in edu- 


In the second experiment, a 


concept-relevant social comparisons to 


themselves in relation to significant others 
in school. If children’s perceptions of 
themselves as worthy individuals are af- 
fected by special groupings, then the orga- 
nizations imposed by schools must be criti- 
cally analyzed. As the policy of including 
exceptional children within the mainstream 
of education becomes more prevalent, 
knowledge of the effects of “mainstreaming” 
upon self-concept becomes imperative. 

The possible negative impact on children’s 
self-concept of special class placement has 
been a concern of educators for some time 
(Dunn, 1968; Jones, 1972; Meyerowitz, 1965, 
1967), but the evidence does not clearly 
support the expectation that special class 
assignment results in stigmatization and 
concomitantly diminished self-regard. 
Jones (1974), for example, reported that 
special class students reported more positive 
evaluations of their schools than regular class 
students. 

For academically handicapped children in 
segregated classrooms, the research is both 
scant and contradictory. Wagonseller 
(1972) compared the self-concepts of aca- 
demically handicapped, emotionally dis- 
turbed, and institutionalized emotionally 
disturbed children. All had comparably low 

self-concept scores. Larsen, Parker, and 
Jorjorian (1973) found significantly greater 


.0663/18/7004-0487800.15 
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discrepancies between real and ideal selves 
in academically handicapped than in regular 
class children. On the other hand, Swanson 
and Parker (1971) found no significant 
self-concept differences between regular 
class and academically handicapped chil- 
dren. Piers (1969) also cited studies in 
which the self-concepts of children in special 
groups (including special education classes, 
classes for stutterers, classes for emotionally 
disturbed, and groups of economically de- 
prived children) were not significantly dif- 
ferent from those of regular class children. 
Since children assigned to special classes 
are typically segregated from mainstream 
children for most if not all of each school day, 
it may be that they predominantly utilize 
their immediate peer reference group in 
forming and maintaining their self-concepts. 
Social comparison theory (Festinger, 1954) 
suggests that in the absence of objective 
standards of comparison, people will employ 
significant others in their environment as the 
bases for forming estimates of self-worth. 
Also, given the choice of relatively similar 
and dissimilar others, similar individuals are 
more likely to be selected as the bases for 
social comparisons. In terms of education- 
ally handicapped children in special classes, 
they would be expected to base self-con- 
cept-relevant social comparisons on other 
academically handicapped children in the 
same classroom. When social comparisons 
are made with other children who possess 
similar academic handicaps, there is little 
reason to expect attenuated self-concepts, 
It might be expected that only to the extent 
that handicapped children are regularly 
exposed to other, nonhandicapped children 
would their estimations of self-worth be di- 
minished. The presence of other children 
without academic handicaps would intro- 
duce another basis for social comparisons, 
Since the new social comparison group gen- 
erally would Possess superior academic 
performance capabilities, the self-concept of 
the special class child might be diminished 
somewhat (to the extent that the child uti- 
lized the new reference children when mak- 
ing self-concept-relevant social compari- 
sons). 
_ When the academically handicapped child 
1s integrated into regular classrooms (i.e., 
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mainstreamed), a new peer reference group 
is formed, consisting of children potentially 
possessing greater academic performance 
capabilities than those of the previous spe- 
cial class peer reference group. The main- 
streamed academically handicapped child 
may underachieve relative to his or her new 
classmates. Several studies have depicted 
underachievers as having low self-esteem 
(Combs, 1963; Fink, 1962; Goldberg, 1960; 
Shaw, 1961). Thus, the net effect of the 
mainstreaming experience might be a dimi- 
nution of self-regard. Relative to his or her 
new peer reference group, the child might be 
an academic underachiever. To the extent 
that the new group is utilized in making so- 
cial comparisons, self-regard (particularly in 
the area of intellectual and school status) 
might decrease. 

Prediction of self-concept change for 
children mainstreamed for part of each 
School day and taught in a special class for 
the rest of the day requires additional con- 
siderations. Whereas a child in a special 
class or a child mainstreamed full day is 
limited to one group of peers, the child 
mainstreamed for only part of the day is 
provided with two groups, each having the 
capacity to serve as comparative reference 
groups. Not only would the child have a 
small group of children academically similar, 
but also the child would have a group labeled 
normal with whom to make additional 
self-other comparisons. Hyman and Singer 
(1971) and Rosenberg (1968) have pointed 
out that the individual may exercise rela- 
tively great freedom in choosing reference 
groups and may choose groups to enhance 
self-regard or to protect the ego.. If en- 
hancement of self-regard is both a function 
of comparisons with similar others and the 
freedom to select reference groups for these 
Comparisons, academically handicapped 
children mainstreamed for half of each day 
would be expected to exhibit an increased 
self-concept, 

_ Thus, academically handicapped children 
in special classes would be expected to utilize 
only one reference group in making self- 
Concept-relevant. social comparisons and, 
due to the similarity of other special class 
children, would not be expected to exhibit an 
attenuated self-concept. 


Academically , 
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handicapped children reintegrated into the 
} mainstream on a full-day basis, however, 
would not have similar others for compari- 
sons and therefore might experience a di- 
minished self-concept. Nevertheless, aca- 
demically handicapped children main- 
streamed for only part of the day, and hence 
free to choose between reference groups for 
self-other comparisons, would be expected 
to choose reference groups in order to en- 
hance the self-concept and thus might ex- 
perience augmented self-regard. 
Experiment 1 investigated the effects of 
a half-day mainstreaming experience upon 
the self-concepts of academically handi- 
capped children previously in segregated 
classrooms. The second experiment, con- 
ducted 1 year later, assessed the effects of 
limiting self-other comparisons to one ref- 
erence group following regular class inte- 
-— of academically handicapped chil- 
en. 


Experiment 1 


Method 


Subjects. Subjects were 50 children en- 
rolled in eight classrooms for the academi- 
cally handicapped in four elementary schools 
in a large metropolitan school system. 
When pretesting was conducted, the chil- 
dren had been enrolled in the special class- 
rooms an average of 12 academic months. 
The children ranged in ages from 6 years 2 
months to 10 years 10 months, with a mean 
age of 9 years 6 months, and were 7896 male 
and 84% Caucasian. The mean full scale 
Wechsler Intelligence Scale for Children- 
Revised (WISC-R; Wechsler, 1974) IQ was 
86.96. 

Subject assignment. Each of the four 
schools had multiple segregated special 
classrooms. One special classroom in each 


l school was selected randomly to receive the 


experimental treatment, which consisted of 
half-day integration of the children into 
regular classrooms. One of the remaining 
Segregated classrooms was selected ran- 
domly from each school to provide compar- 
ison data. Thus, there were four experi- 
mental classrooms and four comparison 
- classrooms. Comparison classroom children 
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remained in their segregated classrooms 
throughout each school day, while children 
in the experimental classrooms were indi- 
vidually integrated into regular classes for 
half of each school day during reading and 
math class periods. Enrollment in each of 
the eight special classrooms was limited to 
10 students. 

Instruments and data collection. The 
WISC-R and the Metropolitan Achievement 
Test (MAT; Durost et al., 1971) were ad- 
ministered to experimental and comparison 
children by trained and supervised personnel 
of the George Peabody College for Teachers’ 
Child Study Center on a pretest-posttest 
basis. These instruments were adminis- 
tered at the beginning of the school year in 
October and again at the end of the school 
year in May. 

Both groups were administered the 
Piers-Harris Children’s Self-Concept Scale 
(Piers, 1969; Piers & Harris, 1964) three 
times. The Piers-Harris (PH) scale was 
administered to experimental and compar- 
ison children on a pretest—posttest basis at 
the same time that academic and intellectual 
data were collected. Additionally, both 
experimental and comparison children were 
administered the PH scale 1 month after 
mainstreaming began, in either January or 
February, depending upon the exigencies of 
the participating schools. Thus, the PH 
data were collected in October at the first of 
the school year (Time 1), in February or 
March, 1 month after mainstreaming began 
(Time 2), and at the end of the school year in 
May (Time 3). Within each school, the ex- 
perimental and comparison classes were 
tested on the same day on all occasions. 
Testing was always conducted in the special 
classrooms. 

The PH scale consists of 80 declarative 
statements, to each of which the respondent 
indicates whether the item describes the way 
he or she feels. Approximately half the 
items are worded positively and half nega- 
tively in order to obviate possible social de- 
sirability and response acquiescence biases. 
The items were constructed at a third-grade 
reading level, but the scale may be used at 
lower reading levels when individually ad- 
ministered or when the administrator reads 
the items individually. Scores may range 
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Table 1 
Summary of Preliminary Analyses é A 
Experimental Comparison 
group group 

Variable M SD M SD F 
Age in months 117.40 17.46 110.28 12.99 2,83 
Months in special class 11.18 6.27 9.47 8.63 49 
Metropolitan Achievement Test 

Word Knowledge GE 2.30 68 2.14 84 54 

Reading GE 1.97 .57 1.95 1.04 .00 

Total Reading GE 2.15 .56 2.00 81 E 

Math GE 2,70 98 2.13 70 5.25% 
Full scale WISC-R IQ 87.96 8.53 85.96 11.82 31 
Piers-Harris Children's Self-Concept Scale 

Composite 5260 1337 53.08 9.35 02 

Behavior 12.36 4.38 12.40 2.84 00 

Intellectual and School Status 11.68 3.30 11.36 3.03 3 

Physical Appearance and Attributes 7.76 2.59 7.32 2.55 37 

Anxiety 7.00 2.60 7.32 2.17 22 

Popularity 6.92 2.69 6.72 2.95 06 

Happiness and Satisfaction 6.64 1.93 6.76 1.59 .06 


Note. GE = grade equivalent, WISC-R = Wechsler Intelligence Scale for Children-Revised. 
* p < 025 (df = 1,45). 


from 0 to 80 on the total (or composite) three testing times. A total of 63 experi- 
self-concept index. Additionally, the scale i i : 
may be scored for six cluster scores, Pur- at Time 1, 33 in the former and 30 in the 
porting to measure Subdimensions of Self- latter group. Attrition rates were 18% an d 
regard: Behavior, Intellectual and School 17% in the experimental and control groups, 
Status, Physical Appearance and Attributes, respectively. Across both groups, attrition | 
Mei. sober, and s abpineas and was due to absenteeism on one or more of the | 

atisfaction. The test man iers, 1969 i ildren), 
reported Kuder-Richardson Remi, e E dtd 


j moving (3 children), i leted test 
homogeneity coefficients ranging from .78 to E Ed ee 


-93. Four-month test-retest coefficients of colle 2 i j d 25 
stability ranged from .71 to .77. ae dr ll 


to. [ comparison children, N s for these two 

'The PH scale was administered in small groups were later equalized by randomly 
groups. Each item was read aloud by the excluding the data for two experimental 
ee and any unfamiliar words jects. N of 50 was used in all | 
were explained. When all children had self-concept anal 
marked their responses in their individual ERU dar follow. 
booklets, the next item was read and if nec- Results 
essary explained further. The administra- 
tor, none with the classroom teacher and the 
teaching aide, circulated among the students conducted to ch k initi ara- 
to ensure that no one was falling behind in bility of th aia e inm pel 
responding to the items. Written parental groups. Results of these 


Permission was obtained before the PH scale marized j i es 
was administered. Ma Eoo 


Subject attrition Due to analysis con- i i iffer ini i 
i i Eoi 3 : Parison groups did not differ initially (Time | 
Siderations, subjects Were retained in the 1) in either race or sex composition, x?(1) J 


study only if PH data were available at all .60 and x*(1) = 19, respectively. One-way “jf 


child). Complete PH data were 
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Table 2 


Covariate and Criterion Means, Standard Deviations, 
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and Adjusted Criterion Means for the 


Experimental and Comparison Groups 


Adjusted 
Unadjusted criteria criteria 
Time Time 
Dependent measure Possible Covariate Time 2 Time 3 2 3 
and group range M SD M SD M SD M M 
Composite self-concept 
Experimental 0-80 52.60 13.37 5892 11.91 63.68 11.94 59.04 63.80 
Comparison 53.08 9.35 55.32 11.90 55.20 13.02 55.20 55.08 
Behavior 
Experimental 0-18 1236 4.38 1356 3.72 14.72 3.74 13.57 1473 
Comparison 1240 284 1324 2.71 13.20 3.04 1323 13.19 
Intellectual and School Status 
Experimental 0-18 1168 3.30 1352 2.69 14.00 2.80 13.44 13,92 
Comparison 11.36 3.03 12.20 348 11.60 3.81 12.28 11.68 
Physical Appearance and 
Attributes 
Experimental 0-12 176 2.59 868 2.72 924 211 869 9.15 
Comparison 7.32 2.55 692 3.30 7.20 3.11 7.01 7.29 
Anxiety* 
Experimental 0-12 7.00 2.60 832 293 908 272 840 9.16 
Comparison 1.32 2.17 7.88 2.22 7.76 2.73 7.80 7.68 
Popularity 
Experimental 0-12 6.92 269 816 227 892 210 8.12 8.88 
Comparison 6.72 2.95 760 247 7.40 252 7.64 7.44 
Happiness and Satisfaction 
Experimental 0-9 6.64 1.93 7.52 1.64 7.88 162 17.55 7.91 
Comparison 676 159 550 210 732 180 657 79 


a Higher Anxiety cluster scores indicate lower levels of self-reported anxiety. 


analyses of variance, summarized in Table 
1, indicated that the experimental and 
comparison groups were also initially com- 
parable on the variables of age, previous time 
in special classrooms, MAT Word Knowl- 
edge grade equivalents, MAT Reading grade 
equivalents, MAT Total Reading grade 
equivalents, full scale WISC-R IQ, total 
self-concept scores, and all six PH self-con- 
cept cluster scores. The two groups differed 
significantly on only one variable, MAT 
Math grade equivalents, F(1, 45) = 5.25, p 
< 095. Self-concept mean scores for both 
groups were comparable to the norm group 
mean of 51.84 reported by Piers (1969). 
Primary analyses. Repeated-measures 


factorial analyses of covariance (with Time 
as repeated cri- 


2 and Time 3 data serving 
terion measures and Time 1 data as the co- 
variate) were used to analyze the PH data for 


the experimental versus comparison groups 
(Winer, 1962). Separate analyses of covar- 
iance were conducted on the total (compos- 
ite) self-concept scores, as well as on each of 
the six derived cluster scores. 

The self-concept data for experimental 
and comparison groups are summarized in 
Table 2. Means and standard deviations are 
presented for the covariate (Time 1) mea- 
sures, as well as for the criterion measures 
(Time 2 and Time 3). Adjusted criterion 
means are presented in the last two col- 
umns. 

The analyses of covariance upon com- 
posite self-concept scores produced a sig- 
nificant treatment main effect, F(1, 47) = 
5.42, p < .025, indicating that the experi- 
mental group adjusted means were signifi- 
cantly greater than the comparison group 
adjusted means. The main effect for trials 
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Mean Self—Concept 


45 e— Experimental 


9---0 Comparison 


1 2 3 


Measurement Occasions 


Figure 1. Mean composite self-concept scores for ex- 
perimental and comparison groups at three measure- 
ment times. 


(i.e., measurement times) and the Treatment 
X Trial interaction effect were not signifi- 
cant, F(1, 48) = 2.67, p = -1051, and F(1,48) 
= 2.95, p = .0885, respectively. Figure 1 
graphically depicts the mean covariate 
(Time 1) and adjusted criterion (Time 2 and 
Time 3) composite self-concept scores. The 
comparison group increased 2.12 points from 
Time 1 to Time 2 and then leveled off. The 
slight mean increase by the comparison 
group was probably a test-retest (or instru- 
ment familiarity) effect characteristic of the 
instrument. Piers (1969) reported that such 
an effect has been found consistently in the 
direction of a more Positive self-concept, 
The experimental group, on the other hand, 
increased 6.44 points from Time 1 to Time 
2 and then another 4.76 points from Time 2 
to Time 3, 

Analyses upon the PH cluster scores pro- 
duced significant treatment main effects for 
Intellectual and School Status, F(1, 47) = 
5.93, p « .02, Physical Appearance and At- 


tributes, F(1, 47) = 7.76, p < .01, Popularity, 


F(1, 47) = 4.25, P <.05, and Happiness and 
Satisfaction, F(1, 47) = 5.66, p < .025. 
There were no significant main effects for 
trials, nor were there any significant Treat- 
ment X Trial interactions on any of the six 
cluster score analyses. Adjusted mean 
change scores on all Seven self-concept in- 
dices are summarized in Table 3. The ex- 
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perimental group had greater total gains on 
all indices. In every case, the greater in- 
crease was registered from Time 1 to Time 
2. 'The assumption of homogeneity of 
within-classes regression was violated in one 
analysis, that for Physical Appearance and 
Attributes, F(1, 46) = 7.51, p « 01. 

One-way analyses of covariance were 
conducted to determine if the experimental 
and comparison groups differed significantly 
at the conclusion of the academic year on 
academic and IQ variables. Pretest scores 
on each variable were used as covariates. 
These analyses are summarized in Table 4. 
The two groups did not differ significantly 
on any of the variables tested. 


Discussion 


The academically handicapped children 
who were integrated into the educational 
mainstream for part of each school day ex- 
hibited significantly augmented self ~con- 
cepts. One explanation for these results is 
that these children selectively utilized the 
two available comparative reference groups 
in maintaining and augmenting their self- 
regard (Hyman & Singer, 1971; Rosenberg, 
1968). For academic-relevant social com- 
parisons, they may have selected the group 
most similar to themselves (Festinger, 1954), 
that is, the other academically handicapped 
children in their special classroom settings. 
For other self-concept-relevant comparisons, 
however, these children may have utilized 
their new regular classroom reference group. 
The mainstream integration experience may 
have imbued the children with an enhanced 
sense of belonging or of being more a part of 
the overall school environment. Assuming 
that the children liked specific aspects of 
both classes, it might have seemed that they 
were enjoying the best of both worlds. The 
net effect of providing the opportunity of 
exercising selectivity of comparison refer- 
ence groups was significantly augmented 
self-concept, even in the absence of sub- 
stantially improved academic performance. 
Tt was possible that the children felt more a 
Part of the main organization of the school; 
and yet, the school organization treated 
them specially by including them in two 
groups, thereby providing the opportunity 
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‘Table 3 


Time 3 (Total Change) 


Dependent measure 


Composite self-concept 


nges in the Piers-Harris Children's Self-Concept 
"Experimental and Comparison Groups from Time 1 to 


Time 1 to 


Time 2 


Time 2 to 
Time 3 


Scale Adjusted Mean Scores for the 
Time 2, Time 2 to Time 3, and Time 1 to 


Change 
Total (Time 1 


and group to Time 3) 


Experimental 6.44 4.16 11.20 
Comparison 2.12 —.12 2.00 
- Behavior 
Experimental 1.21 1.16 2.31 
Comparison 83 —.04 19 
"Intellectual and School Status 
Experimental 1.76 A8 234 
Comparison 92 —.60 32 
Physical Appearance and Attributes 
Experimental 83 56 1.39 
Comparison —.31 .28 -.03 
Anxiety 
Experimental 1.40 .76 2.16 
Comparison 48 —.12 36 
Popularity 
Experimental 1.20 -16 1.96 
Comparison 92 —.20 T2 
Happiness and Satisfaction 
Experimental 91 36 1.27 
Comparison -19 64 fea 


to select different reference groups for dif- 
ferent social comparisons. 

A plausible alternative explanation for the 
obtained results, however, is that the chil- 
dren who were integrated into the academic 
mainstream viewed the process as a success 
experience. Accordingly, the augmentation 
of self-concept could have resulted not be- 
cause of the availability of multiple com- 
parative reference groups, but rather be- 
cause the mainstreaming process was inter- 
preted by the children as indicating that they 
were successful academically. Experiment 
1 did not permit the elimination of this 
competing explanation. Moreover, Exper- 
iment 1 failed to provide insight into the 
question of what would have occurred had 
the children been integrated into the edu- 
cational mainstream for the entire school 
day. In this case, reference group selectivity 
would have been eliminated, and the only 
group available would have consisted of 


performance capabilities. Under these cir- 
cumstances, the children’s self-regard may 
well have decreased. Experiment 2 tested 
the plausibility of the alternative success 
interpretation as well as explored the impact 
upon self-concept of full-day mainstream- 


ing. 
Experiment 2 


Experiment 2 was conducted 1 year after 
Experiment 1 and studied only children who 
were integrated into the academic main- 
Since all children were main- 
streamed for part of each day, the alternative 
success explanation could not apply to some 
Administra- 
tively, it was no 
some academically handicapped children 
full day while others were mainstreamed 
half day. Instead, a manipulation was de- 
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Means, Standard Deviations, and Analysis of Covariance Fs for Experimental and Comparison 


Groups on Academic and IQ Variables 


Pretest 
M SD Mf SD M F 


Unadjusted posttest Adjusted posttest 


Dependent measure 
and group 
Metropolitan Achievement Test 
Word Knowledge GE 
Experimental 2.3 
Comparison 21 
Reading GE 
Experimental 19 
Comparison 20 
Total Reading GE 
Experimental 21 
Comparison 2.0 
Math GE 
Experimental 2.6 
Comparison 21 
Full scale WISC-R IQ 
Experimental 88.0 


Comparison 86.0 


7 2.6 11 


3p 
9 24 24 
6 23 9 29: 45" 
11 23 a 2.2 
6 24 8 24 5033 
8 23 P 24 
10 29 12 c Me 
a 26 8 2.9 
85 90.0 10.6 890 — x 


118 88.9 11.5 89.8 
8B 8B 
Note. GE = grade equivalent. WISC-R = Wechsler Intelligence Scale for Children-Revised. 


^ df = 1,39. 
b df = 1, 40. 
cdf = 1,45. 


signed to enhance the saliency of regular 
class group membership for some children 
but not for others. 

It was predicted that the mainstreaming 

experience would produce different effects 
upon children’s self-concept, depending 
upon whether or not their regular classroom 
group membership was made salient, 
thereby eliminating their special classroom 
peers as a comparative reference group. For 
mainstreamed children permitted unre- 
stricted utilization of available reference 
group(s), a replication of Experiment 1 re- 
sults was predicted (i.e., increased self-con- 
cept). For mainstreamed children receiving 
the manipulation designed to focus their 
self-concept-relevant social comparisons 
upon the members of the classroom into 
which they had been integrated, however, a 
decreased self-concept was predicted. 


Method 


Subjects. Subjects were 20 academically handi- 
capped children (17 boys and 3 girls) enrolled in three 
segregated elementary school classrooms in a large 
metropolitan school system. Ages ranged from 8 years 


3 months to 11 years 0 months, with a mean age of 9 
years 7 months. Full scale WISC-R IQ scores ranged 
from 72 to 115, with a mean of 92.06. 

Instruments. Self-concept was assessed by the 
Piers-Harris Children's Self-Concept Scale, a self- 
report inventory described in Experiment 1. 

Procedure. Pretest self-concept measures were 
collected through group administrations at the begin- 
ning of the school year. All Participants were main- 
streamed into regular classrooms for half of each school 
day during math and reading periods beginning in No- 
vember. Six weeks after mainstreaming began, the 
children were individually readministered the self- 
concept scale. Within each of the three schools, chil- 
dren were randomly assigned to experimental and 
comparison groups: Experimental children were re- 
moved from their mainstreamed classrooms for testing, 
while comparison children were removed from their 
special classrooms. The procedure for administering 
the self-concept scale to comparison children was the 
same as that reported in Experiment 1, with the ex- 
ception of individual versus group testing. The nature 
of the task was explained to the comparison children, 
and they were explicitly assured of confidentiality (i.e- 
they were assured that parents, teachers, and other 
children would not see their responses). : 

The experimental children received a manipulation 
designed to enhance the saliency of their membership 
in the regular classroom into which they had been 
mainstreamed and to reduce the probability that they 
would utilize their special classroom peers as the basis 
for making self-concept-relevant social comparisons: 
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Instructions to the experimental group children were 
the same given comparison children (ie., the nature of 
the task and confidentiality) plus the following: 


[Child's name], you are a member of [mainstream 
classroom teacher's name] classroom. Here is a list 
of some of the children in your class. Let’s read the 
names together, and as we read them I want you to 
circle the names of the children whom you know. 
(Then, regular instructions about task and confi- 
dentiality were given.] As you mark your answers 
to these statements, you will find that it is some- 
times necessary to think of your classmates. For 
example, the first statement says, “My classmates 
make fun of me.” On this and on the other state- 
ments where it is necessary to think of your class- 
mates, remember to think of the children in [main- 
stream classroom teacher's name] classroom. 


The list of children presented to the experimental 
participants consisted of 12 names drawn randomly 
from the class roster of the regular classrooms into 
which the children had been mainstreamed. 


Results 


Analyses indicated that the random as- 
signment procedure successfully equated 
experimental and comparison groups on the 
variables of age, IQ, MAT scores, and self- 
concept scores. 

Analyses of the self-concept change scores 
indicated that the groups differed signifi- 
cantly, F(1, 18) = 5.60, p < 03. While the 
comparison group exhibited a mean increase 
of 7.30 points, the experimental group de- 
creased by an average of 2.50 points. An 
omega square was calculated, and a value of 
.187 was obtained, indicating that the group 
salience manipulation produced not only a 
statistically significant effect but also ac- 
counted for substantial variance in the de- 
pendent variable. 

Analyses of the PH cluster scores indi- 
cated that significant group differences were 
confined to the Intellectual and School 
Status, F(1, 18) = 4.12, p = 0548, and Anx- 
iety, FQ, 18) = 5.88, p < 025, clusters. The 
xhibited a mean increase 
of 1.50 points on the Intellectual and School 

i tal group 


cluster score, the comparison group dis- 

played a mean increase of 2.30 points 

(toward being less anxious), while the ex- 

| perimental group exhibited a mean decrease 
ų of .20 points. 
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General Discussion 


In Experiment 1, children who were 
mainstreamed for half of each day exhibited 
significantly augmented self-concepts rela- 
tive to other academically handicapped 
children who remained in segregated special 
classrooms. Possibly, this occurred as a re- 
sult of the mainstreamed children's ability 
to selectively utilize different comparative 
reference groups. The mainstreamed chil- 
dren, however, could have viewed the regular 
classroom integration process as à success 
experience. If the children viewed the 
mainstreaming experience as indicative of 
their progress and academic success, then 
this viewpoint also could have accounted for 
their increased self-regard. 

Experiment 2 utilized only children who 
were mainstreamed and manipulated the 
salience of regular classroom group mem- 
bership for some children but not for others 
in an attempt to control the number and 
composition of comparative reference groups 
utilized in making self-concept-relevant 
social comparisons. As predicted, main- 
streamed children with unrestricted freedom 
to utilize multiple reference groups exhibited 
increased self-concept, while mainstreamed 
children who were restricted to their regular 
classroom peers in making social compari- 
sons exhibited decreased self-concept. 
Taken together, the two experiments pro- 
vide rather striking confirmation of the hy- 
potheses derived from social comparison 
theory (Festinger, 1954) and group reference 
theory (Hyman & Singer, 1971; Rosenberg, 
1968). 

Although local norms on the PH scale 
were not available, educationally handi- 
capped children in both experiments initially 
exhibited mean self-concept scores compa- 
rable to the norm mean reported by Piers 
(1969). The pretest composite self-concept 
mean scores for all children in Experiments 
1 and 2, respectively, were 52.84 and 52.40, 
compared to the mean reported by Piers of 
51.84 based on the scores of 1,183 regular 
classroom children. Tt may be inferred that 
initially the children in both experiments 
were basing their self-evaluative judgments 
on other academically handicapped children 
in their own classroom. Morse and Gergen 


ib. d 
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(1970) pointed out that although an indi- 
vidual associates with many others in a day, 
not all of those others serve as bases for 
comparison. Only those who provide in- 
formation that is valuable in assessing the 
individual's position and in planning his or 
her behavior are deemed "useful." Addi- 
tionally, they suggested that “the mere 
presence of another person who is like one- 
self may be sufficient to boost one's self- 
esteem" (Morse & Gergen, 1970, p. 154). 
Moreover, Festinger (1954) stated that in- 
dividuals compare themselves with those 
who are more similar than dissimilar. “The 
tendency to compare oneself with some other 
specific person decreases as the difference 
between his ... ability and one’s own in- 
creases” (Festinger, 1954, p. 120). There- 
fore, both the perceived similarity of other 
people and the usefulness of comparisons 
with them appear to be factors in the selec- 
tion of comparison individuals and/or 
groups. 

Moreover, when the academically handi- 
capped children were integrated into regular 
classrooms for half of each day, they exhib- 
ited significantly increased self-regard. A 
crucial point relevant to the half-day main- 
streaming situation was made by Festinger 
(1954) in reference to a person who deviates 
toward the lower end of an ability scale: 
“Provided he has other comparison groups 
for self-evaluation on this ability he may 
remain personally and privately quite un- 
affected by this group situation” (p. 138). 
Theoretically, a special class child who has 
similar children with whom to make useful 
comparisons has a basis for a positive self- 

concept, even when the child is integrated 
into regular classroom settings for part of 
each day. The special class child who is in- 
tegrated into the educational mainstream for 
part of each day is provided with two peer 
reference groups. Presumably, as Hyman 
and Singer (1971) have emphasized, the 
child may exercise relative freedom in se- 
lecting which group to use in making self- 
concept-relevant comparisons: 

Self-appraisal rests on the framewor i - 
parison, and the choice of a comparative rors es 
maintains, enhances, or injures self-regard. (p. 76) 
Por social comparisons, he chooses a - 
hance his self-regard or protect Mb m m 
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"Thus he maintains some control over his own self-regard 
by his choice of comparative reference groups and 
guides his own fate accordingly. (p. 77) 


Moreover, Rosenberg (1968) emphasized. 
that the individual may select different ref- 
erence groups for different self-relevant 
comparisons. 

When academically handicapped children. 
were restricted to the utilization of a single 
comparative reference group consisting of 
regular classroom children following main- 
streaming, however, their self-regard de- 
creased. The nature of the reference group 
restriction consisted of verbal directions to 
compare themselves only with regular 
classroom children. Although effective, this 
manipulation cannot be considered an equal 
substitute for the actual experience of full- 
day mainstreaming. The experience of the 
classroom as the sole reference group 

may have had greater impact on the self- 
concepts of the experimental children. 
Their self-concepts might have been lo 
even further by the actual experience 
full-day integration. The slight decr 

exhibited by the experimental children 
Experiment 2 increases in importan 
moreover, when one considers that the usi 
test-retest effect is a mean increase of fr. 
two to five points (cf. Experiment 1’s com- 
parison group and Piers, 1969). ‘ 

As well as clearly indicating that children 
use classroom reference groups in forming 

and maintaining conceptions of themselves, 
the results provide insight into the process 
of self-concept-relevant social comparisons. 
When similar others are available, children 
use those who are similar and disregard 

who are not similar, thus protecting their 
self-concepts from possible diminution. 

the other hand, when similar others are re- 
moved as a source of comparison, self-con- 
cept declines if those remaining are superior 
on the relevant ability dimension. Hyman 
and Singer's (1971) hypothesis that com- 
parisons that provide useful information are 
selectively utilized to enhance self-esteem 
was supported by these experiments, as was 
Festinger's (1954) hypothesis that persons 
at the lower end of an ability scale may re- 
main unaffected by a group situation pro- 
vided they have other relevant comparison 
groups for self-evaluation. 


ACADEMICALLY HANDICAPPED AND MAINSTREAMING 


In addition to adding to our knowledge of 
social comparison and reference group 
theory, the results of these experiments 
present important practical considerations 
for those involved in educational program 
planning. Mainstreaming can be a valuable 
experience for academically handicapped 
children accompanied by an increase in 
self-esteem provided contact with similar 
others is maintained. Sudden, full-day in- 
tegration into regular classrooms, on the 
other hand, might be seriously detrimental 
to the self-regard of the academically 
handicapped child. 

‘Additional research might fruitfully ex- 
plore the extent to which these findings are 
generalizable to other categories of handi- 
capped children. Also, future r 
might focus more attention on specific sub- 
dimensions of self-concept. The only cluster 
score to reflect significant group differences 
in both experiments, for example, was In- 
tellectual and School Status. Conceivably, 
globalistic assessments of self-concept might. 
fail to reflect significant group differences 
under some circumstances. 
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Individual Differences in Learning: Visual 
Versus Auditory Presentation 
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This study concerned the possible interaction of individual differences in 
learning with mode of presentation. The subjects were college students; data 
analyses were replicated by conducting separate analyses for two groups of 77 
and 83 subjects, respectively. Each subject learned 4 test lists of 20 words 
each, 2 under auditory presentation and 2 under visual presentation. T he 
main analyses indicated that individual differences in learning were reliable 
and that individual differences were just as predictable across modalities as 
within modalities. A complementary analysis showed that subjects could not 
be reliably classified in terms of auditory-visual preference scores. The find- 
ings gave no support to the contention that subjects can be classified as audi- 


tory learners or visual learners. 


Choosing between auditory and visual 
presentation of learning materials is a deci- 
sion of potentially great importance for in- 
struction. Separate from the question of 
whether one mode of presentation is on the 
average better or worse than the other, there 
appears to be a general belief that within a 
group of students, some will learn better 
from visual presentation, while others will 
learn better from auditory presentation. 
For example, Kolson and Kaluger (1963) 
admonish teachers to identify the “auditory 
learners” and “visual learners” in their 
classes and to adjust instruction appro- 
priately. In arguing for the use of different 
media for instruction, Romiszowski (1974, 
p. 50) aptly described this belief in stating 
that "some learners may learn better from 
certain media than from others (not very 
much is known about such factors, though 
they do seem to exist).” 

One kind of evidence that has been inter- 
preted as demonstrating the existence of 
auditory learners and visual learners is a low 


This study is based on a thesis submitted by the 
first author in partial fulfillment of the requirements 
for the Master of Arts degree in special education. A 
partial report of the results was made at the annual 
meeting of the Illinois Council for Exceptional Children, 
Chicago, November 1977. 

Requests for reprints should be sent to Roger L. 
Dominowski, Department of Psychology, University of 
Illinois at Chicago Circle, Chicago, Illinois 60680. 


correlation between auditory and visual ; 
learningscores. In reviewing earlier studies 
providing such evidence, Jensen (1971) 
criticized this interpretation, arguing that 
the cross-modal correlation must be com- 
pared to the intramodal correlations. Only 
if the auditory-visual (AV) correlation is 
significantly lower than the auditory-audi- 
tory (AA) and visual-visual (VV) correla- 
tions is there evidence for auditory learners 
and visual learners. Application of this 
criterion results in the characterization of 
the earlier work as, at best, inconclusive. 
The appropriateness of Jensen’s analysis can 
be illustrated with a more recent study. On 
the basis of very low AV correlations, Snyder 
and Pope (1972) concluded that among 6 
year olds, capability within one modality 
does not mean capability within the other. 
However, inspection of their data indicates 
that the AV correlations were not system- 
atically lower than intramodal (AA and VV) 
correlations. Rather than concluding that 
auditory and visual abilities were indepen- 
dent, the more appropriate conclusion is that 
no ability had been reliably measured. . 
Jensen (1971) examined individual dif- 
ferences in auditory and visual memory 
using a short-term memory task. In his 
study, the AA and VV correlations were high, 
indicating reasonable reliability for the 
within-modality scores, However, the AV 
correlations were equally high; individual 
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differences in memory were just as predict- 
able across modalities as within a modality. 
In brief, there was no evidence for the exis- 
tence of auditory learners and visual learn- 
ers. 

The present study was an extension of 
Jensen’s (1971) research, differing in that a 
multiple-trial, list-learning task was em- 
ployed. In terms of rough “face validity,” 
Jensen’s findings might not have strong 
implications for instruction because his 
task —remembering a string of digits for a 
few seconds—does not resemble ordinary 
classroom learning. More to the point, re- 
search indicates that individual differences 
in short-term memory are at best modestly 
related to individual differences in list 
learning (Gorfein, Bennett, Arbak, & Graves, 
1969; Gorfein & Blair, 1971). Consequently, 
Jensen’s findings leave open the question of 
whether individual differences are as pre- 
dictable across modalities as within modal- 
ities when the task involves learning larger 
amounts of material and retention over in- 
tervals longer than a few seconds. The 
present research addresses this question. 


Method 


Materials 


From the Paivio, Yuille, and Madigan (1968) norms, 
120 nouns were selected that had Thorndike-Lorge 
(Thorndike & Lorge, 1944) frequencies of 20 or more 
and ratings of 5.0 or greater on the concreteness, imag- 
ery, and meaningfulness scales (a total of 128 words in 
the norms met these criteria). Six lists were formed by 
ordering the 120 words alphabetically and then as- 
signing words in alternation to Lists A through F. For 
each list, a single random order of the 20 words was 
for presentation. 


Subjects and Procedure 


160 students, who participated to 


The subjects were 
psychology 


fulfill requirements for an introductory 


‘Auditory (A) 1 
sentations were alternated: Roughly half the subjects 
learned their six lists under an AVAVAN order, while 
the remainder received a VAVAVA order. Similarly, 
approximately half the subjects received visual pre- 
sentation of Lists A, B, and C (auditory presentation of 
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Lists D, E, and F), with the remainder receiving visual 
presentation of Lists D, E, and F. 

For each list, four study-test trials were given. 
During the study portion of a trial, the words were 
presented one at a time at a 114-sec rate; following 
presentation of the last word in the list, subjects were 
given 90 sec for written recall of the words in any order 
they chose. For visual list presentation, a Kodak car- 
ousel projector was used to project the words on a screen 
at the front of the room; a tape recorder was used for 
auditory presentation of the words (recorded by a fe- 
male speaker). Upon completion of the fourth trial on 
a list, a brief rest was given, after which the subjects 
were informed that trials for the next list would begin. 
After the four trials on the last list, subjects received a 
brief explanation of the purpose of the study and were 
dismissed. 

Preliminary analyses indicated that there were no 
significant effects associated with particular sessions 
in which data were collected or with the particular lists 
(A, B, and C vs. D, E, and F) learned under a given 
presentation mode. These variables were ignored in the 
analyses to be presented. The data for subjects who 
received lists under an AVAVAV order (auditory-first 
group; n = T1) were kept separate from those for 
subjects receiving a VAVAVA order (visual-first group; ` 
n = 83) in order that the analyses could be repli- 
cated. 


Results 


Data analyses were based on four scores 
foreach subject. Each score consisted of the 
total number of words correctly recalled over 
four trials, one score per test list. The scores 
were distinguished in terms of presentation 
mode and learning order within mode (first 
or second and auditory or visual test list) and 
were labeled as follows: AUD-1, AUD-2, 
VIS-1, and VIS-2. Two additional scores 
were computed for each subject: AUD- 
SUM = AUD-1 + AUD-2, and VIS-SUM = 
VIS-1 + VIS-2. Summary data are shown 
in Table 1. ; 

Two points can be made about the data in 
Table 1. First, practice lists were given in 
order to minimize learning-to-learn and 
warm-up effects in the data to be analyzed. 
Comparison of the means for successive test 
lists within modalities (AUD-1 vs. AUD-2 
and VIS-1 vs. VIS-2) for each group indi- 
cated that no such effects were present 
(largest F = 2.16, p > .05). Second, in each 
group, overall performance under visual 
presentation (VIS-SUM) was slightly better 
than that under auditory presentation 
(AUD-SUM), F (1,76) = 15.13, p € 01, and 
F(A, 82) = 11.29, p < .01, respectively. 


500 


- 


CAROL J. DEBOTH AND ROGER L. DOMINOWSKI 


Table 1 > : 
Summary Data for Auditory and Visual Learning Scores 


Auditory-first group 


Visual-first group 


(n =77) (n = 83) 

Score M SD Range M SD Range 
AUD-1 49.2 9.4 28-68 45.7 11.4 18-69 
AUD-2 48.9 10.6 24-73 46.5 11.5 18-74 
VIS-1 51.5 9.2 27-69 47.9 10.0 19-75 
VIS-2 51.9 10.0 28-74 49.3 10.6 23-74 
AUD-SUM 98.1 18.4 55-133 92.2 21.6 37-141 
VIS-SUM 103.4 18.1 56-142 97.2 19.1 42-148 ? 


Note. AUD-1, AUD-2, VIS-1, and VIS-2 refer to the total number of words recalled for the first and second auditory and visual 
tasks, respectively. AUD-SUM equals AUD-1 plus AUD-2, and VIS-SUM equals VIS-1 plus VIS-2. 


The major purpose of the research was to 
compare the predictability of individual 
differences in learning within and across 
presentation modalities. In the present 
data, within-modality (AA and VV) corre- 
lations are provided by the AUD-1/AUD-2 
and VIS-1/VIS-2 correlations, whereas the 
AUD-SUM/VIS-SUM correlation repre- 
sents the cross-modality (AV) correlation. 
The relevant data are displayed in Table 2. 
According to Jensen (1971), the proper 
comparison involves computing an esti- 

mated AV correlation using the following 
formula: 


TAV 
TAA" VV 
In principle, the estimated AV correlations 
will equal 1.00 if individual differences do 
not interact with presentation modality. 
Should the estimated AV correlation be 
substantially lower than 1.00, one can claim 
evidence for the existence of auditory 
learners and visual learners. 

Insertion of the appropriate values from 


estimated ray = 


Table 2 
Pearson Product-Moment Correlation 
Coefficients 


Auditory-first Visual-first 
Variable 


group group 
AUD-1/AUD-2 +.70 +.77 
VIS-1/VIS-2 +.77 +.72 
AUD-SUM/VIS-SUM +.79 +.79 


Note. AUD-1, AUD-2, VIS-1, and VIS-2 refer to the total 
number of words recalled for the first and second auditory and 


AUD-2, and VIS-SUM equals VIS-1 plus VIS-2. 


Table 2 into the formula yields estimated AV 
correlations of +1.09 and +1.08 for the two 
groups. Such “impossible” values are quite 
similar to those obtained by Jensen (1971) 
and clearly contradict the hypothesis that 
individual differences are not stable across 
modalities. 

It might be argued that the above esti- 
mates are spuriously high because the 
AUD-SUM and VIS-SUM scores are base 
on tests that are twice as long as thos 
yielding the scores entering into th 
within-modality correlations. To accom 
modate this argument, one could estimate 
the reliabilities of the AUD-SUM and VIS- 
SUM scores by applying the Spearman- — 
Brown prophecy formula (Guilford, 1965) to 
the obtained AUD-1/AUD-2 and VIS-l/ 
VIS-2 correlations. Application of the 
Spearman-Brown formula yields estimated 
reliabilities as follows: For the auditory-first 
group, +.82 for AUD-SUM and +.87 for 
VIS-SUM; for the visual-first group, +.87 for 
AUD-SUM and +.84 for VIS-SUM. If these 
estimated reliabilities are inserted (as rAA 
and ryy) into the formula for estimating ray, 
the resulting estimated AV correlations are 
+.93 and 4.92 for the two groups. While 
such doubly estimated AV correlations are 
less than 1.00, they are sufficiently near 
unity to support the conclusion that indi- 
vidual differences in learning can be pre- 
dicted just as well across presentation 
modalities as within modalities. 

alternative method that has been used 
to study individual differences in auditory 
and visual learning inyolves a two-step pro- $. 
cedure. An initial measurement episodels 
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used to label subjects with respect to moda- 
lity preference, either A > V or A < V. Once 
identified, these two groups of subjects are 
then compared in terms of performance on 
subsequent auditory and visual tasks. The 
rationale is that if auditory learners and vi- 
sual learners exist, groups initially differing 
in terms of A — V difference scores should 
differ significantly when A — V difference 
scores are measured a second time. Results 
based on this methodology have been in- 
consistent (Bateman, 1967; Lilly & Kelleher, 
1973; Waugh, 1973). Data from the present 
study could be arranged to conform to this 
design by selecting extreme scorers on the 
first auditory and visual test lists (AUD-1 — 
VIS-1) and comparing these groups in terms 
of A — V differences on the subsequent test 
lists (AUD-2 — VIS-2). The main analysis 
strongly implies that A — V difference scores 
will not be stable across measurement epi- 
sodes. Nonetheless, the analysis was con- 
ducted to provide an alternative examina- 
tion of the question and because it was pos- 
sible that a comparison of extremes might 
yield results not apparent when all subjects 
were included in the analysis. 

| V'Rach subject's (AUD-1 — VIS-1) differ- 
ence score was computed, and the subgroups 
to be compared were composed of those 
subjects whose difference scores were more 
than one standard deviation away from the 
mean for the entire group. For the audi- 
tory-first group, the mean difference score 
was —2.3 (SD = 6.5). The auditory prefer- 
ence subgroup included subjects with dif- 
ference scores of +5 or greater (n = 13), 


Table 3 
Mean Difference Scores (AUD-2 — VIS-2)" for 


Subjects Initially Classified into Auditory 


Preference and Visual Preference Subgroups 


Auditory-first Visual-first 


Initial group Ux A 
classification M SD M SD 
Auditory -22 110 5 Tl 

preference 
Visual -2.0 8.5 -3.7 8.6 


preference 
a AUD-2 — VIS-2 is the difference in total number of words 
. recalled for the second auditory task and the second visual 
& task. 


501 


while subjects in the visual preference 
subgroup had difference scores of —9 or less 
(n = 11). For the visual-first group, the 
comparable data were M — —2.2 (SD = 8.5). 
Auditory preference included difference 
scores of +7 or greater (n = 13), and visual 
preference included scores of —11 or less (n. 
= 13). The auditory preference and visual 
preference subgroups were then compared 
in terms of A — V differences on the subse- 
quent test lists (AUD-2 — VIS-2). Sum- 
mary data are shown in Table 3. 

In Table 3, it can be seen that for the au- 
ditory-first group, the means for the two 
preference subgroups are virtually identical 
(F <1). For the visual-first group, the mean 
difference is in the proper direction but falls 
far short of significance, F(1, 24) = 1.62, p > 
20. The analysis indicates that subjects 
selected for extreme A — V differences in one 
measurement episode do not maintain the 
separation when A — V differences are as- 
sessed a second time. The results support 
the conclusion that individual differences in 
learning do not reliably interact with pre- 
sentation modality. 


Discussion 


The present findings provide no support 
for the contention that within a group of 
learners, one will find auditory learners and 
visual learners. Rather, one can predict 
individual differences in learning just as well 
across presentation modalities as within 
modalities. The results support Jensen’s 
(1971) conclusion and indicate that the 
conclusion applies whether one considers 
short-term memory (Jensen, 1971) or a task 
involving larger amounts of material and 
longer retention intervals. 

Several aspects of the present data are 
worth emphasizing. First, there was con- 
siderable variation in learning among the 
subjects, as shown in Table 1. Second, in- 
dividual differences in learning were quite 
stable, as evidenced, for example, by the 
within-modality correlations. A critical 
point is that these individual differences 
were just as stable across presentation 
modalities. The picture that emerges is that 
given a fixed presentation of learning ma- 
terials (in whatever modalities), acquisition 
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will vary widely among a group of learners. 
Perhaps the search for so-called auditory 
learners and visual learners has directed 
attention away from these more fundamen- 
tal individual differences in learning, which 
are obviously deserving of investigation. 

If individual differences do not interact 

with presentation modality, then, one can 
choose the modality that leads to the best 
average performance. However, research 
findings regarding the relative merits of 
auditory versus visual presentation are in- 
conclusive and fraught with interpretive 
difficulties. There is abundant evidence 
that auditory presentation results in better 
short-term memory than visual presentation 
(Penney, 1975), although there are some 
exceptions (Freides, 1974). With respect to 
learning tasks involving larger amounts of 
material, longer retention intervals, and 
multiple learning trials, findings are quite 
mixed. One review of the literature led to 
the conclusion that auditory presentation is 
typically superior (Postman, 1975), yet a 
number of researchers have found visual 
presentation superior (Berry, Detterman, & 
Mulhern, 1973; McCall & Rae, 1974; Siegel 
& Allik, 1973) as in the present study. 

For theoretical purposes, comparisons of 
auditory and visual presentation pose many 
problems and seem subject to complex in- 
teraction effects. For example, modality 
differences might interact with presentation 
rate (e.g., Siegel & Allik, 1973) or with oral 
versus written recall (McCall & Rae, 1974). 
Even if items are presented at. the same rate 
under both presentation modes, visually 
presented items are likely to be “on” slightly 
longer than when auditory presentation is 

employed (Berry et al., 1973). In several 
studies reporting visual superiority, visual 
presentation of pictures has been compared 
with auditory presentation of the names of 
the pictures, thus confounding presentation 
mode with the material to be learned. 
Consequently, the findings are theoretically 
ambiguous, particularly since research has 
shown that pictures lead to better learning 
nel presented words (see Paivio, 


Although such complexities present for- 
midable theoretical problems, they might be 
less critical for practical purposes. Class- 
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room instruction seldom involves extremely 
short presentation rates or retention inter 
vals. It can be argued that visual presenta- 
tion has certain characteristics that muddle 
theoretical work but that can be used to ad- 
vantage in instruction. For example, 
pictorial presentation is superior to eit] 
visual or auditory word presentation, the 
choice seems clear. Graphic material that 
provides organizational aids can realistica 
be presented only visually. Visually pre- 4 
sented material can be allowed to remain in 
view, thus providing an external memory aid 
or the additional study time that is not pos- 4 
sible with auditory presentation. 
techniques are worth using regardless of the 
resolution of the fundamental theoretical 
questions regarding auditory versus visua 
presentation. 
A clear limitation of both the present 
study and Jensen’s (1971) research is that 4 
college students served as subjects. Thus, 
it could be argued that auditory learners and 


students but might be found among youn 
learners. As stated earlier, there was amp! 
variation in learning among the subjects of 
the present study; consequently, homogeze 
neity of learners is not itself the point im 
question. The argument might be phrased 
as follows. Among college students (or other 
fully developed learners), individual differ- 
ences in learning and retention reflect dif- 
ferences in relevant prior knowledge, learn- 
ing strategies, and depth of processing. 
With younger learners, in addition to these 
factors, there might be differences in re- 
Sponsiveness to material presented in dif- 
ferent modalities. Gibson (1974) has spec- 
ulated that access to meaning through varl- 
ous inputs might be a consequence of de- 
velopment. That is, meaning is elicited by 
objects fairly early in development, followed 
in rough order by pictures, spoken words, 
and printed words. For the less-developed 
individual, visual presentation of a word 
might require recoding into an acoustic form, 
whereas meaning can be directly accesse 
from either visual or auditory presentation 
for the more mature person. 

Two implications can be drawn from these 
ideas. First, one might expect to find the 
difference between auditory and visu: 


| Gibson's reasoning. 
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presentation to change with developmental 
level, with auditory presentation relatively 
more advantageous for younger learners. 
Second, one might expect to find auditory 
learners and visual learners among a group 
of developmentally immature learners while 
not identifying such differences among more 
mature learners. 

At present, the evidence is scant and am- 
biguous. For example, Siegel and Allik 
(1973) found that visual presentation was 
superior to auditory presentation for kin- 
dergarten, second-grade, fifth-grade, and 

| college students. However, since the visual 
stimuli were pictures, the findings have un- 
certain relevance for the implications of 
Williams, Williams, 
and Blumberg (1973) also found visual pre- 
sentation superior for students in Grades 2 
through 10, although there were suggestions 
of complicating interactions with social class 
at lower grade levels. Attempts to identify 
auditory learners and visual learners among 
young school children have met with decid- 
edly mixed results (Bateman, 1967; Lilly & 
Kelleher, 1973; Waugh, 1973). It must be 
noted that the failures to find modality- 
specific learning abilities have been associ- 
ated with the use of subscales of the Illinois 
Test of Psycholinguistic Abilities 
(Paraskevopolous & Kirk, 1969) to classify 
types of learners. Lilly and Kelleher (1973) 
have argued that the reliabilities of the test 
scores are sufficiently low to make them of 
doubtful utility in such investigations. Until 
better methods are employed in studies of 
younger learners, it seems appropriate to 
maintain an open mind with respect to the 
question of whether individual differences 
in learning interact with presentation mod- 
ality within any school-age population. 
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Social problem-solving training, conducted as part of a social studies curricu- 
lum with 185 third- and fourth-grade children, was assessed by a measure of 
problem-solving thinking, a structured adult-child interaction, a structured 
group interaction, and a measure of locus of control. . The experimental design 
used four training groups: no treatment (control), video modeling tapes (tele- 
vision), video modeling tapes plus discussion exercises (discussion), and video 
modeling tapes plus role-play exercises (role play). The major findings re- 
vealed significant overall treatment effects on problem-solving thinking, the 
group interaction, and locus of control. The findings were interpreted as indi- 
cating that the role-play treatment is more likely to transfer to everyday social 


interactions and enhance children's social competence. 


Several researchers have suggested that 
a person’s ability to engage in problem- 
solving thinking improves the ability to cope 
with everyday social problems (Allen et al., 
1976; Jahoda, 1953, 1958; Spivack & Shure, 
1974; Spivack & Levine, Note 1). Per- 
forming effectively in problematic social 
interactions places an especially heavy de- 
mand upon the individual's perceptual, 
cognitive, and motoric processes (Argyle, 
1967, 1969; Argyle & Kendon, 1967). Social 
performers must (a) accurately perceive and 
interpret social cues, (b) select an appro- 
priate strategy for reaching their goal, (c) 
implement their strategy by emitting ap- 
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propriate motoric responses, and (d) coor- 
dinate their social performance with other 
individuals by attending to the ongoing 
feedback from their social interactions (Ar- 
gyle & Kendon, 1967). It is this complex 
interaction of individual, interpersonal, and 
situational variables that differentiates ef- 
fective social problem-solving performance 
from effective problem-solving performance 
on nonsocial tasks. 

Although there is little evidence of a re- 
lationship between social coping skills and 
problem-solving performance on impersonal 
tasks (e.g., water jar problems and algebra 
problems), recent correlational studies have 
demonstrated a relationship between ratings 
of social adjustment and cognitive perfor- 
mance on hypothetical social problems 
(Platt & Spivack, 1972; Shure & Spivack, 
1972; Shure, Spivack, & Jaeger, 1971; Spi- 
vack & Shure, 1974; Spivack & Spotts, 1967; 
Spivack & Levine, Note 1; Larcen, Spivack, 
& Shure, Note 2). The findings of these 
studies combined with evidence that social 
adjustment problems of children may be 
predictive of similar problems in later life 
(Bower, 1969; Cowen et al., 1973; Stennet; 
1968) have suggested the use of social prob- 
lem-solving training as a primary-prevention 
mental health strategy (Allen et al., 1976; 
Spivack & Shure, 1974). 


Copyright 1978 by the American Psychological Association, Inc. 0022-0663/78/7004-0504$00.75 
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Two social problem-solving interventions 
present evidence that training may improve 
children’s social adjustment. Spivack and 
Shure (1974), working with inner-city chil- 

dren, and Larcen (1973), working with ele- 
‘mentary school children, independently 
demonstrated that training increased the 
frequency of verbal problem-solving be- 
haviors emitted in response to hypothetical 
- social problem stories. Improvement in 
"teacher ratings of children's social adjust- 
ment (Spivack & Shure, 1974) and a signif- 
icant shift toward internality on a measure 
of locus of control (Larcen, 1973) were also 
reported. Unfortunately, both studies 
inextricably confounded the treatment ef- 
fects of a combination of different training 
techniques (e.g., role playing, modeling, 
games, and dialogues) on the dependent 
variables. There is no direct evidence that 
the increased frequency of problem-solving 
behaviors leads to more effective solutions, 
which presumably mediate enhanced social 
performance and social adjustment. 

The purpose of this study is to clarify both 
the theoretical and the practical relation- 
ships between social problem solving, social 
performance, and social adjustment (or 
competence) in an elementary school setting. 
A training program was developed and in- 
tegrated into the existing academic teaching 

' structure in order to assess (a) the relative 
contributions of different training tech- 
niques in modifying social problem-solving 
performance, (b) the generalization of 
problem-solving behaviors across a number 
of different problematic social situations, 
and (c) the relationship between the fre- 
quency of problem-solving behaviors and 
solution effectiveness. Important additional 
goals were to collect follow-up data on the 
Larcen (1973) intervention and to observe 
the program's impact on the school sys- 
tem. 


Method 


School System and Subjects 
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grade subjects and two teachers who had participated 
in the Larcen (1973) intervention were included in our 
study in order to collect follow-up data. The school was 
administratively divided into four teaching units, with 
a staff of four to five teachers responsible for the aca- 
demic curriculum of each unit. Because of the indi- 
vidually guided education program format used in the 
school, subjects from different units did not share 
classes. Subjects within a unit rotated individually 
among the different teachers according to their indi- 
vidually prescribed needs and their teachers’ skills. 

All subjects, upon enrolling in the school as third 
graders, were randomly assigned to their teaching units. 
All fourth-grade subjects in this study had been ran- 
domly assigned to their present units as third graders, 
the grade at which they participated in the Larcen 
(1973) intervention. Consequently, all subjects were 
randomly assigned to teaching units. 

Subjects were randomly distributed among four 
groups; role play, discussion, television, and control. 
The teaching unit used for the Larcen (1973) experi- 
mental treatment, with two of its trained teachers and 
trained subjects (current fourth graders), was retained 
for the role-play and discussion treatments in the 
present study. This teaching unit was originally ran- 
domly assigned to the experimental treatment; and the 
two teachers serving as trainers were initially selected 
from a pool of volunteers to control for voluntarism in 
the Larcen (1973) intervention. The previously trained 
fourth-grade subjects within this unit were randomly 
assigned to either the role-play or discussion treatment. 
Fourth-grade subjects in other units were randomly 
assigned to television and control treatments. All 
third-grade subjects in this year's study were randomly 
assigned to units and treatments. 


Training Curriculum and Procedures 


The classroom training was divided into six compo- 
nents based on D'Zurilla and Goldfried's (1971) prob- 
lem-solving model and Shaftel and Shaftel's (1967) 
classroom training procedures. The problem-solving 
orientation component included the following: (a) the 
expectancy that problems are a normal part of life, (b) 
the expectancy that children can solve many of their 
own problems, (c) an awareness of affect as a problem 
cue, and (d) an inhibitory set to "stop and think" before 
responding impulsively to problems. The problem 
identification component included the following: (a) 
discriminating relevant from irrelevant elements of the 
problem, (b) the importance of accurately identifying 
the problem, and (c) setting short- and long-term goals. 
The alternative solutions component encouraged the 
subjects to generate many possible solutions to each 
problem. The consideration-of-consequences com- 
ponent taught the child to consider the potential ob- 
stacles and opportunities associated with each solution 
before deciding on a course of action. The elaborations 
component emphasized the concrete, step-by-step 
process of planning necessary to implement a solution. 
Finally, the integration component presented all of 
these social problem-solving features as a unified set of 
strategies that one might apply to à problem. 

Six narrated videotapes, using child actors of the 
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subjects’ approximate age, were developed to model 
components of the social problem-solving process. 
Each videotape introduced a new problem-solving 
component to the curriculum (e.g., problem identifi- 
cation and outcomes). 
In-service workshops. ‘Two classroom teachers and 
six female undergraduate assistants participated in 6 
weekly 1-hour-long in-service trainers’ workshops 
conducted by the experimenter. A social problem- 
solving workbook presented the rationale, lesson plan, 
and materials for implementing the in-class training. 
The workbook materials were a series of incomplete 
social problem stories to be used in small-group dis- 
cussion and role-play exercises. The workship sessions 
were used to elaborate, practice, and modify the in-class 
training presented in the social problem-solving work- 
book. Time was reserved to discuss problems en- 
countered in applying previous in-class problem-solving 
exercises and to generate possible solutions to these 
problems for use in future classroom training. 
In-class training conditions. Role-play, discussion, 
and television groups were introduced in class to a new 
problem-solving component each week by one of the six 
modeling videotapes. Role-play and discussion 
subjects engaged in small-group exercises on the school 
day following exposure to the modeling tape. Subjects 
met in groups of four to five male and female peers and 
were read a problem story by a teacher or an assistant. 
Both discussion and role-play small groups were focused 
on the same series of problem stories to control for 
content. Discussion groups were asked to tell how they 
would cope with the problematic situations presented 
in the story. Role-play groups were asked to role play 
alternative enactments they thought might solve the 
problem. The teachers and assistants guided the dis- 
cussion and role-play exercises to emphasize the prob- 
lem-solving component corresponding to the appro- 
priate videotape. The television groups participated 
in their regular social studies curriculum (e.g., American 
and state history) on the school day following exposure 
to the modeling tapes and served as a control for the 
training effects of the modeling tape alone. A control 
group received no training beyond their regular social 
studies curriculum. 


Assessment Procedure 


"The assessment procedures used in this study were 
based upon Goldfried and D’Zurilla’s (1969) “behavior 
analytic method," which includes (a) a situational 
analysis of the subject's problematic environment, (b) 
enumeration of potential responses to the problematic 
environment, and (c) evaluation of the effectiveness of 
these responses. 

Situational analysis. The situational analysis was 
accomplished by collecting written and verbal accounts 
of problematic social situations frequently encountered 
by the subjects from their teachers and the subjects 
themselves. Three experimental analogues based upon 
these problematic social situations were then developed 
to provide for controlled observation of the subjects’ 
problem-solving responses and effectiveness. 

The first experimental analogue was the “Problem- 
Solving Measure” developed by Larcen (1973) from the 
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means-ends problem-solving technique of Shure and 
Spivack (1972) to assess children’s problem-solving 
thinking. The measure consists of open-middle stories 
that begin by describing a child engaged in a prob- 
lematic social situation and end with the child resolving 
the problem. Each subject was individually adminis- 
tered the Problem-Solving Measure outside of class by 
an experimenter, who read the stories to the subject and 
asked the child to “tell what happens in between." The 
subjects’ responses were recorded in writing by the ex- 
perimenter. 

The Problem-Solving Measure pretest and the 
posttest version differed both in content and in the 
number of stories. The Problem-Solving Measure 
pretest was composed of two open-middle stories used 
by Larcen (1973). The Problem-Solving Measure 
posttest was composed of three open-middle stories : 
developed from problematic situations identified in the 
situational analysis. 

The “dyad interaction” is an experimental analogue 
identifying problematic peer, teacher, and environ- 
mental situations. The dyad interaction was a more in 
vivo interaction between a child and a mother figure 
(experimenter's confederate), The mother, introduced 
as a visitor to the school, presented three problems her 
own child feared he or she would encounter upon en- 
tering the subject's school. The sex of the mother's 
child was altered to match the sex of the subjects. The 
three problems were (a) getting lost on the bus or in 
school (environmental problem), (b) academic and 
teacher problems (teacher problem), and (c) making 
friends at school (peer problem). The mother would 
usually mention one of the three problems to the subject 
and, if necessary, administer one, two, or three prods to 
encourage the subject to respond. The prods ranged 
in intensity from (a) repeating the problem, (b) stating 
concern about the problem, and (c) asking the subject 
what he or she would do to solve the problem. The 
more prods required to make the subject respond, the 
less sensitive the child was assumed to be to the prob- 
lem. After the subject responded to each of the three 
problems, the mother presented him or her with up to 
three prearranged obstacles to any problem solutions 
the subject had suggested. If the subjects overcame the 
first obstacle, the mother administered a second ob- 
stacle and similarly gave the third obstacle. Persistency 
was measured by the number of obstacles administer 
to the subject. These procedures were repeated with 
all three problems. "The verbal interaction was recorded 
unobtrusively by a cassette tape recorder. F 

The “Friendship Club interaction” was an experi- 
mental analogue of problematic peer group situations 
based on the situational analysis. Subjects were asked 
to participate in a contest called Friendship Club. 
They were taken in groups to a large room in the school, 
which was partitioned into an equipment area and @ 
television studio. The equipment area contained a 
video monitor and a videotape recorder. The television 
studio contained a television camera and tripod, a ml" 
crophone, a large table, and five chairs. A stack of five 
office title cards (e.g., president and secretary), in- 

scribed by magic markers, was placed on the table. 
sixth chair was in the television studio and next to the 
camera and opposite the table. After the subjects We 
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given a brief demonstration of the television equipment, 
they were told that they were one of several teams ina 
contest about how children can learn to make friends. 
An award was promised to the team that gave the best 
answer to the contest question. They would be given 
a practice question and then the actual contest question. 
The only rules they had to follow were that (a) all six 
members must help answer the question, (b) all six 
members must agree on the team's best answer, and (c) 
all six members must be club officers. The subjects 
were then left alone in the television studio. In addition 
to the explicitly stated interpersonal problem-solving 
task (ie. the practice and contest problems), the 
subjects were confronted with a number of problems 
embedded in the Friendship Club interaction setting. 
These additional problems were the following: (a) five 
chairs for six subjects (chair problem), (b) five officer 
cards for six subjects (missing role problem), and (c) the 
process of distributing officer titles (role distribution 
problem). The Friendship Club interaction was vid- 
eotaped as part of the contest procedure. 

Enumeration of problem-solving responses. Enu- 
meration of problem-solving responses was done by 
trained raters scoring directly from the Problem-Solving 
Measure written protocol, the dyad interaction audio- 
tape recording, and the Friendship Club interaction 
video recording, respectively. Subjects’ responses were 
scored according to Larcen’s (1973) Problem-Solving 
Measure manual for three problem-solving components: 
the number of alternative solutions generated (alter- 
natives), the number of specific steps elaborated for 
each solution (elaborations), and the number of obsta- 
cles and consequences associated with each solution 
(outcomes). 

In addition, dyad interaction audiotapes were scored 
for problem-sensing and problem-solving persistency 
by counting the number of prods (prods) and the 
number of experimentally administered obstacles 
(obstacles), respectively. Subjects’ solutions of ob- 
stacles were not scored as alternatives. This procedure 
assured that any covariation of alternatives and ob- 
stacles was possible, as long as the subject gave at least 
one alternative to a problem. 

Solution effectiveness. Response evaluation con- 
sisted of submitting a brief description of each prob- 
lematic situation imbedded within the three experi- 
mental analogues and their respective subject responses 
(i.e., alternatives) to a panel of judges. The judges (10 
volunteer graduate students in psychology) were also 
given a description of the subjects and of the experi- 
mental analogues in which the subjects were responding. 
All alternatives (minus duplications) to the problematic 
situations of all subjects across treatment conditions 
were randomly presented following the description of 
each problematic situation. The judges were instructed 
first to read all alternatives associated with each prob- 
lematic situation and then to rate each solution on a 
T-point effectiveness scale (inferior to extremely ef- 
fective). Effectiveness was defined as meaning the 
subject’s solution solves the problem while maximizing 
Positive consequences and minimizing negative conse- 
quences (social as well as personal consequences; 
Goldfried & D'Zurilla, 1969). The mean rating that this 
Panel of judges gave each solution was then used by two 

Taters to score the effectiveness of each subject’s solu- 
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tion of the Problem-Solving Measure, dyad interaction, 
and Friendship Club interaction. 

Locus of control. The subjects’ locus of control was 
measured by the Norwicki-Strickland (1973) Locus of 
Control Scale for Children. Research findings with the 
Norwicki-Strickland internal-external instrument 
indicate that children's locus of control is related to 
delay of gratification, self-initiated behavior, academic 
achievement, and self-esteem (Strickland, 1972, 1973). 
Allen et al. (1976) found locus of control to be a possible 
mediating variable in children's social problem-solving 
thinking. 

Lorge-Thorndike Intelligence Tests. 'The Lorge- 
Thorndike (Lorge & Thorndike, 1964) Intelligence 
Tests are a standardized measure of children's verbal 
and nonverbal intelligence. This measure was ad- 
ministered to assess the relationship between intelli- 
gence and social problem-solving behavior. 


Overall Program Procedure 


Data from this study were collected during 1973- 
1974. A Solomon (1949) design was used with third- 
grade subjects to assess possible pretesting effects. All 
fourth-grade subjects were pretested on the Problem- 
Solving Measure in order to obtain adequate follow-up 
data on the Allen et al. (1976) interventions, Subjects 
were asked by their teachers to talk to a man from the 
university about some children’s stories. One of five 
randomly assigned experimenter assistants, blind to 
treatment conditions, took the subjects individually 
from their classes to one of five small testing rooms 
provided by the school. After reading the Problem- 
Solving Measure stories and recording the subject’s 
responses, the assistant returned the subject to his or 
her classroom. All subjects were pre- and posttested 
in class with the group-administered Norwicki- 
Strickland internal-external scale. 

Subjects in the role-play and discussion treatments 
were randomly assigned to small training groups of five 
to six male and female students. Each group was led 
by a teacher or an assistant. A balancing system was 
developed that controlled teacher differences in the 
role-play and discussion treatment groups, so that each 
group leader led each role-play and discussion group an 
equal number of sessions throughout the training pro- 
gram. The 10 discussion and role-play sessions lasted 
about 30 minutes. 

About 5 to 10 days after training was completed, all 
subjects were posttested with the group-administered 
Norwicki-Strickland scale and the individually ad- 
ministered Problem-Solving Measure posttest and the 
dyad interaction. To control for order effects, a coun- 
terbalancing procedure was used with the dyad inter- 
action and Problem-Solving Measure posttest: Half 
of the subjects were randomly assigned to take the 
Problem-Solving Measure posttest first and half to take 
the dyad interaction first. IQ assessment was group 
administered in class 3 months after training was 
completed. : 

Subjects were recruited for the dyad interaction by 
an experimenter, who announced in class that a man 
from the university would be talking to some children 
and their mothers about school. One of five randomly 
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assigned assistants later took the subjects individually 
from their classrooms to a waiting room, where the 
mother was also waiting to talk to the man about school. 
The dyad interactions lasted from 8 to 15 minutes be- 
fore the children were interviewed by the man and re- 
turned to class. 

Two months after the training was completed, the 
Friendship Club interaction was implemented. Prior 
to the Friendship Club interaction, teachers announced 
in class that some people from the university would be 
taking volunteers to participate in a contest. Later, one 
of three randomly assigned assistants, blind to treat- 
ments, picked up groups of six subjects from class and 
took them to the studio to play Friendship Club. 
Subjects were randomly assigned, within their treat- 
ments, to these small groups of six same-sex, half third- 
and half fourth-grade subjects. To control for possible 
effects of shared training history, subjects were not 
placed in the Friendship Club interaction small groups 
with other subjects who had been members in their 
classroom training exercise small group. The Friend- 
ship Club interactions lasted from 10 to 18 minutes. 
Subjects were thanked for their participation and re- 
turned to their classroom. After the Friendship Club 
interactions, videotapes were scored, and Friendship 

Club awards were in fact presented during a special 
ceremony to the group rated as having the most effective 
solution. Teachers, administrators, and the experi- 


mental team also shared feedback on the intervention 
at this time. 


Results 
Problem-Solving Assessment 


Interrater reliability. Problem-Solving 
Measure pretest and posttest problem- 
solving responses were scored independently 
by two raters, blind to treatment conditions, 
with a Pearson interrater reliability on 15 
overlapped Problem-Solving Measure pro- 
tocols of alternatives (r = -91), outcomes (r 
= .88), and elaborations (r = .88). Seven 
raters independently scored 164 dyad in- 
teractions for problem-solving components, 
with an interclass correlation (Snedecor, 
1946) on 16 randomly selected overlapped 

dyad interactions of alternatives (r = .93), 
outcomes (r = .92), and elaborations (r= 
.94), respectively. Interrater percentage of 
agreement between two raters was 96% for 
dyad interaction problem sensing (prods) 
and 98% for dyad interaction problem-solv- 
ing persistency (obstacles). Two raters in- 
dependently scored 16 Friendship Club in- 
teractions for problem-solving components, 
with an interrater reliability on a random 
selection of six overlapped Friendship Club 


L. McCLURE, J. CHINSKY, AND S. LARCEN 


interactions of alternatives (r = .84), out- 
comes (r — .62), and elaborations (r — .89), 
respectively. Problem-Solving Measure, 
dyad interaction, and Friendship Club in- 
teraction solution effectiveness were rated 
independently by two raters whose agree- 
ment was 9796. Because of the high corre- 
lation found between subjects' average so- 
lution effectiveness and their most effective 
solution rating in all three assessment set- 
tings (i.e., Problem-Solving Measure, r = 94; 
dyad interaction, r = .99; and Friendship 
Club interaction, r = .99), only average ef- 
fectiveness results will be reported. 

Problem-Solving Measure. Analysis of 
Problem-Solving Measure pretest data, 
collected over 4 months after completion of 
the Allen et al. (1976) intervention, failed to 
reveal any long-term training effects on al- 
ternatives, outcomes, elaborations, or ef- 
fectiveness. The effectiveness of training in 
the present study was supported by the 
Problem-Solving Measure posttest data. 

Analysis of variance revealed that trained 
subjects’ (ie. role-play, discussion, and 
television treatments combined) alternatives 
were significantly more numerous (M = 
5.22), F(1, 183) = 4.13, p « .04, and more 
effective (M = 4.89), F(1, 183) = 3.94, p € 
.05, than control alternatives (M = 4.47) and 
effectiveness (M = 4.49). In addition, 
trained subjects generated significantly | 
fewer words (M = 147) than control subjects 
(M = 200), F(1, 183) = 7.01, p < .009, indi- 
cating that the trained group's performance 
was not simply an artifact of amount of ver- 
balization. One unexpected finding was 
that control subjects generated significantly 
more outcomes (M = 4.86) than trained 
subjects (M = 2.58), F(1, 183) = 4.01, p € 
-001. A moderately strong correlation found 
between outcomes and the number of words 
spoken (r = .70) suggests that this effect may 
be largely an artifact of amount of verbali- 
zation. No treatment effects were revealed 
for elaborations, 

Another series of analyses, comparing the 
role-play, discussion, television, and control 
groups as four separate treatments, revealed 
only trends for alternatives, F(3, 181) = 2-56, 
D < 06, effectiveness, F(3, 181) = 2.55, P € 
06, and number of words, F(3, 181) = 249 
D < .06. The means and standard devia- i 
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Means and Standard Deviations of Problem-Solving Performance in the Problem-Solving 
Measure Posttest, the Dyad Interaction, and the Friendship Club Interaction 


Problem-solving Role play Discussion Television Control 
performance M SD M SD M SD M SD 
Problem-Solving Measure 
Alternatives 5.34 2.40 5.52 2.93 4.41 2.31 4.47 1.85 
Outcomes 2.76 215 2.40 3.43 2.61 2.46 4.86 4.86 
Elaborations 3.78 3.24 3.17 3.16 2.87 3.14 2.84 2.69 
Effectiveness 5.16 1.08 4.71 1.62 4.69 1.25 4.49 1.14 
Words 150 101 153 140 131 100 200 143 
Dyad interaction 
Alternatives 2.48 1.64 2.06 1.53 2.53 1.23 2.11 1.64 
Outcomes 59 17 .56 1.09 .65 £79 64 .82 
Elaborations -16 1.14 84 1.77 1.76 2.63 .56 .87 
Effectiveness 3.24 2.31 2.54 1.94 3.31 1.49 2.60 1.93 
Obstacles 4.29 3.96 3.72 3.49 4.71 3.04 3.89 3.51 
Friendship Club interaction 
Alternatives 1.88 1,33 1.54 1.89 1.54 1.50 1.26 94 
Outcomes 1.50 1.53 1.38 1.70 1.46 2.31 1.26 1.54 
Elaborations 4.46 5.86 1.42 2.10 3.08 3.37 2.26 3.67 
Effectiveness 1.74 1.21 1.39 1.61 1.36 1.38 1.45 1,10 


tions of these Problem-Solving Measure 
posttest analyses, as well as the dyad inter- 
action and Friendship Club interaction 
analyses, are presented in Table 1. Dun- 
can’s range test revealed that (a) the dis- 
cussion group generated significantly more 
alternatives than the television treatment, 
(b) the role-play treatment generated sig- 
nificantly higher effectiveness than the 
control treatment, and (c) the control 
treatment generated significantly more 
words than the television treatment. No 
other treatment effects or sex effects were 
revealed with the Problem-Solving Measure. 
In addition, analysis of third-grade subjects’ 
Problem-Solving Measure posttest re- 
sponses revealed no Problem-Solving Mea- 
sure pretesting effects. 

Dyad interaction. No significant treat- 
ment or Treatment X Situation interaction 
effects were revealed with repeated measures 
analysis of variance, using the three prob- 
lematic dyad interaction situations as the 
repeated factor and comparing all trained 
subjects as one group with control subjects 
| for alternatives, outcomes, elaborations, ef- 
fectiveness, prods, or obstacles. Treatment 


X Sex X Situation repeated measures anal- 
ysis of variance, using the three problematic 
dyad interaction situations as the repeated 
factor and comparing the role-play, discus- 
sion, television, and control groups as four 
separate treatments, revealed significant 
elaborations treatment, F(3, 140) = 4.96, p 
< .004, and Sex X Treatment interaction 
effects, F(1, 140) = 3.79, p <.002. Duncan's 
range test of the Treatment X Sex group 
means revealed that the television treatment 
female group generated significantly more 
elaborations than each of the other male and 
female comparison groups. No significant 
treatment, Treatment X Sex, or Treatment 
X Situation interaction effects were revealed 
with alternatives, outcomes, effectiveness, 
prods, or obstacles. 

Stepwise multiple regression analysis, 
using alternatives, locus of control, and IQ 
as predictors, selected alternatives for both 
prods (r? = .37), F(1, 114) = 65.72, p < 001, 
and obstacles (r2 = .65), F(1, 114) = 208.14, 
p <.001. As expected, the number of al- 
ternatives was negatively related to prods (r 
= —.61) and positively related to obstacles 


(r = 81). 
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T.V. 

^ Control 


MEAN ALTERNATIVES 


Role 
Distribution 


Chair 


Figure 1. lt i 
depicted in a contest called Friendship Club. 


Friendship Club interaction. Repeated 
measures analysis of variance, using the five 
problematic situations as the repeated factor 
and nested on the Friendship Club interac- 
tion contest assessment groups, revealed a 
significant alternatives Treatment X Sit- 
uation interaction effect, F(12, 48) = 1.94, p 
<.05. A significant situation effect, F (4, 48) 
= 23.06, p < .001, and a significant Treat- 
ment X Situation interaction effect, F(12, 
48) = 3.42, p < .002, were also revealed for 
elaborations.! Friendship Club interaction 
treatment means and standard deviations 
are presented in Table 1. 

Figures 1 and 2 present the Friendship 

, Club interaction Treatment X Situation 
group means for alternatives and elabora- 
tions, respectively. Duncan's range test of 
all Treatment X Situation alternative group 
means revealed that the role-play situation 
group mean in the practice situation was 
significantly higher than all other Treatment 
X Situation group means, except the dis- 
cussion/role distribution and the televi- 
sion/contest means, respectively. Duncan's 
range test of all elaborations Treatment X 
Situation group means revealed that (a) the 
role-play/role distribution group mean was 
significantly higher than all other Treatment 
X Situation group means, (b) the control/ 
role distribution group mean was signifi- 
cantly higher than the television/role dis- 
tribution group mean, and (c) the televi- 


Contest 


Missing Practice 


Role 


Mean number of alternative solutions subjects suggested to problematic social situations 


sion/contest group mean was significantly 
higher than the discussion/contest and the 
control/contest group means, respectively. 
Additional Duncan range tests revealed 
other significant alternatives and elabora- 
tions differences when comparing perfor- 
mance in one situation with performance in 
a different situation, but none of these 
formed a consistent pattern. 

Predicting solution effectiveness. 
Stepwise multiple regression analyses, using 
alternatives, outcomes, and elaborations as 
predictors of effectiveness, selected alter- 
natives for the Problem-Solving Measure, 
F(1, 157) = 61.08, p < .001, the dyad inter- 
action, F(1, 162) = 407.42, p < .001, and the 
Friendship Club interaction, F(1, 166) = 
916.80, p < .001, respectively. The per- 
centage of effectiveness accounted for by 
alternatives was 28% in the Problem-Solving 
Measure, 72% in the dyad interaction, and 
85% in the Friendship Club interaction. 

Problem-solving cross-situational con- 


! No analysis of variance comparing trained (1.2 
role-play, discussion, and television treatments com- 
bined) with control subjects was done in the Friendship 
Club interaction because the statistical requirements, 
in nested designs, of equating the number of neste 
assessment groups in each treatment would have Jos 
sulted in comparing two treatments, each consisting © 
only three groups. In addition, the trained group W^! 
have been drawn from three different experimen 
treatments. 


MEAN ELABORATIONS 


Role 
Distribution 


Chair 


depicted in a contest called Friendship Club. 


sistency. Correlations of alternatives, out- 
omes, elaborations, and effectiveness 
mong the Problem-Solving Measure, dyad 
interaction, and Friendship Club interaction 
assessment settings revealed significant, 
ough weak, correlations between the 
roblem-Solving Measure and dyad inter- 
action outcomes (r = .25), the Problem- 
Solving Measure and Friendship Club in- 
‘teraction outcomes (r = —.19), and the 
Friendship Club interaction and dyad in- 
teraction effectiveness (r = .17). No other 
significant correlations were revealed. 
Locus of control. Analysis of variance of 
the Norwicki-Strickland internal-external 
scale pretreatment data, comparing previ- 
ously trained fourth graders with fourth- 
grade controls, revealed no significant 
long-term effects of Larcen’s (1973) training, 
F(1, 55) = .54, p <.47. However, Treatment 
Grade analysis of variance, adjusted on 
retest locus of control means, revealed 
Significant treatment effects on posttest 
locus of control, F(3, 133) = 4.00, p < .01. 
uncan's range test indicated that fourth- 
grade controls were significantly more ex- 
rnal than the fourth-grade television, dis- 
cussion, or role-play groups, and that third- 
grade controls were significantly more ex- 
ternal than either the fourth-grade television 
or the fourth-grade discussion treatments. 
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^ Control 
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Figure 2. Mean number of solution elaborations subjects suggested to problematic social situations 


No significant grade or Treatment X Grade 
effects were revealed. 

Intelligence test. An analysis of variance 
comparing the four treatment groups' 
Lorge-Thorndike total IQ test scores failed 
to reveal significant treatment differences, 
F(3, 178) = .64, p < .50. 


Discussion 


This study replicates Larcen's (1973) 
findings that social problem-solving training 
with young children increases the frequency 
of solutions generated to hypothetical social 
problems and increases the children's ex- 
pectancy of acquiring reinforcements from 
the environment. Problem-Solving Mea- 
sure findings were extended in this study by 
demonstrating that trained subjects’ solu- 
tions were more effective than those of con- 
trols. Subsequent analyses of the Prob- 
lem-Solving Measure data did not, however, 
provide unequivocal evidence supporting the 
relative efficacy of any one training tech- 
nique, although the role-play and discussion 
treatments tended to be superior to the 
television and control treatments. Also, no 
long-term effects of Larcen's intervention 
were revealed with the Problem-Solving 
Measure or locus of control data collected 6 
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months after the completion of his pro- 
gram. 
The acquisition of improved problem- 
solving thinking demonstrated with the 
Problem-Solving Measure data was not re- 
flected with the dyad interaction. No sig- 
nificant treatment effects were found when 
comparing the combined trained group with 
the controls. The significant treatment ef- 
fect found with elaborations when compar- 
ing the four separate treatments was unex- 
pected because the television females’ per- 
formance was superior to all other male and 
female treatment comparisons. Whether 
this finding is spurious or the outcome of 
some meaningful interaction between 
training (e.g., differences in experienced or 
observed problem-solving consequences), 
subject’s sex, and assessment setting (e.g., 
experimenter’s sex) cannot be determined 
from the dyad interaction data. The dyad 
interaction did provide the theoretically 
important finding that the frequency of al- 
ternative solutions generated by a subject is 
predictive of both measure of problem sen- 
sitivity and problem-solving persistency. 
Significant Treatment X Situation inter- 
action effects were revealed in the Friend- 
ship Club interaction. Post hoc compari- 
sons of Treatment X Situation group means 
indicated the superiority of the role-play 
treatment alternatives and elaborations 
performance, while differences between the 
discussion, television, and control treat- 
ments varied greatly depending upon the 
situation. The Friendship Club interaction 
findings are especially important because the 
assessment took place over 2 months after 
the training was completed, with the school's 
Christmas break intervening. In addition, 
the Friendship Club interaction involved five 
different in vivo problematic peer group 
situations based on the important theme of 
inclusion-exclusion, where with the excep- 
tion of the contest problem, no adult was 
present. 

The discrepancy between the acquisition 
of more productive and more effective 
problem-solving thinking demonstrated with 
the Problem-Solving Measure and the 
varying performance or nonperformance of 
problem-solving behaviors across a variety 
of problematic situations is consistent with 
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human “discriminative facility” emphasized 
by Mischel (1976). Whether or not wd 
choose to implement a cognitive problem4 
solving strategy through observable motorig 
performance depends largely upon the out 
comes we expect in any given situation 
Although training did increase subject 
generalized expectancy of controlling rein 
forcers in the environment, no data am 
available on the subjects' specific expec 
tancies in the assessment situations. Iti 
encouraging that improved problem-solving 
behaviors were available to the trained) 
subjects and, depending upon the assess} 
ment situation, that one, two, or all three of 
the trained groups performed better than thi 

controls. | 

These findings also caution that im 
provements in subjects’ ability to solve hy 
pothetical problems do not necessaril| 
transfer to real-life problem solving. Al 
though the Problem-Solving Measure faile 
to distinguish between the effectiveness 0 
the three different training techniques, t 
Friendship Club interaction findings sugges 
that the role-play treatment is most likely 
transfer to in vivo problem solving in socia 
interactions with peers. In addition, th 
theoretically important finding that thi 
number of solutions generated by a subjec 
is strongly predictive of solution effecti 
ness in the Problem-Solving Measure, th 
dyad interaction, and the Friendship Cl 
interaction settings, respectively, is sup 
portive of the potential value of problem 
solving training for increasing the individ 
al's social competence. 

Some of the long-term system effects 0 
both the Larcen (1973) intervention and th 
present one are (a) expansion of the trainill 
program to the remainder of the elementa! 
school, (b) initiation of a new research pro 
gram with younger children and their pat 
ents in an adjacent elementary school, ant 
(c) a planned follow-up of the long-term ef 
fects of the problem-solving training wit 
our subjects, who are now fifth and six! 
graders. Perhaps most indicative of th! 
program's impact is the fact that membe 
of the school staff and two of the project 
undergraduate assistants continued anc 
expanded the problem-solving trainin 
program to the entire school during the ye 
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following our research study. True to the 
problem-solving orientation, they combined 
and improved elements of Larcen’s (1973) 
training program with elements of our 
training program and many of their own 
ideas for a more versatile and interesting 
training package (see Allen et al., 1976). 

In conclusion, the findings of this study, 
considered with the findings of previous 
studies (e.g., Allen et al., 1976; Spivack & 
Shure, 1974), indicate that social problem- 
solving training is a promising approach to 
enhancing childrens’ social competence and 
primary prevention. Future community 
interventions of longer duration involving 
additional social systems (e.g., the family and 
the criminal justice systems) that affect 
specific target populations may be expected 
Jo have a greater impact and should be en- 
souraged. 
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Note Taking as a Generative Activity 


Richard J. Peper and Richard E. Mayer 
University of California, Santa Barbara 


Three experiments investigated the effects of note taking on “what is learned” 
In Experiments 1 and 2, there was a pattern in 
which note takers excelled on far-transfer test items, while non-note takers ex- 
celled on near transfer, especially for low-ability subjects. In Experiment 3, 
there was a pattern in which the recall protocols of note takers contained more 
idea units concerning underlying conc 
other relevant concepts, while non-note n 
symbols and examples and produced more vague summaries and connectives. 
The results suggest that note taking can result in a broader learning outcome, 
rather than just, more learning overall, because an assimilative encoding pro- 


from videotaped lectures. 


cess is encouraged. 


Past research has not been in agreement 
as to whether or not notes aid in the reten- 
tion of new material. Some researchers 
(Arnold, 1942; Eisner & Rhode, 1959; Fisher 
& Harris, 1974; Howe, 1970; Idstein & 
Jenkins, 1972; Peters, 1972; Todd & Kessler, 
1971) have found note-taking activities not 
to be beneficial to the student, while others 

(Aiken, Thomas, & Shennum, 1975; Di Vesta 
& Gray, 1972, 1973) have found that note 
taking leads to an overall superior perfor- 
mance on retention of new materials. Oth- 
ers (Fisher & Harris, 1973; Fowler & Barker, 
1974; Annis & Davis, Note 1, Note 2) have 
found note-taking activities to aid recall in 
some situations, such as when students were 
allowed to review their notes or when stu- 
dents were allowed to use their preferred 
strategy. In a recent review, Faw and Waller 
(1976) noted, as Peters (1972) had earlier, 
that note taking is more likely to result in 
better overall recall when the presentation 
rate is slow and when subjects can review 
their notes. Faw and Waller further point 
out that the effect of note taking has just 
recently begun to receive much research at- 
tention and that we are far from a useful 
theory of note taking. 

The present study is an attempt to move 
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epts and more intrusions concerning 
takers did better at recalling technical 


toward development of a theory of note 
taking that might be useful in clarifying the 
existing conflicts and in making prescrip- 
tions concerning instructional practice. One 
weakness of most of the prior research on 
note taking has been that learning outcomes 
were evaluated solely in terms of how much 
is retained, often using recall or fact recog- 
nition tests. The present study attempts to 
add new information by investigating wha 
is learned; this question involves a more 
detailed analysis of the learning outcome, 
including tests for differences in transfer and 
the determination of which idea units are 
recalled by note takers and non-note tak- 
ers. 
Faw and Waller (1976), following an ear- 
lier distinction by Di Vesta and Gray (1972, 
1973), have distinguished between two po- 
tential functions of note taking: as an aid in 
encoding and as an external storage for later 
review. The present article will deal only 
with the encoding function, since it holds the 
major theoretical interest. Rothkopf (1970) 
has coined the term mathemagenic activit. 
to refer to any overt activity that determin 
in some way the outcome of learning. I 
short, the present article attempts to inves” 
tigate how the mathemagenic activity of note 
taking influences posttest performance; ! 
does this by investigating the effects of note) 
taking on the internal cognitive process 9 
encoding and on the structure of the resul- 
tant learning outcome. | 
The work of Ausubel (1968) encourages ê 
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J distinction between two types of encoding 
processes, that of assimilating new infor- 
mation to a set of meaningful structures or 
that of adding new information as arbitrary 
` associations. Mayer (1975) has suggested 
that the adding type of encoding process 
„requires only one major condition be met— 
the material must be received (Condition 1); 
however, assimilative encoding requires that 
three conditions be met—the material must 
be received (Condition 1), a meaningful set 
of prior experiences must be available to the 
learner (Condition 2), and the learner must 
actively process that set of experiences 
during learning (Condition 3). Further, 
these two different types of learning result 
in different outcomes as measured by re- 
tention and transfer (Mayer, 1975). 

What is the effect of note taking on the 
type of encoding process that is used and the 
resultant type of learning outcome that is 
acquired? One theory proposed by Frase 
(1970) is that mathemagenic activities such 
as note taking serve to increase the subject’s 
overall attention and orientation to the new 
material; this could be called the attention 
theory. Another related idea is that note 
` taking requires more effort and that material 
that requires deeper levels of activity is 
encoded more deeply; this could be called the 
effort theory and is similar to Craik and 
Lockhart’s (1972) principle of “levels of 
processing.” Both of these theories predict 
that on the average, note-taking subjects 
should perform better on all measures as 
compared to non-note takers, since they 
* simply learn more (or learn more strongly). 
` Thus, the attention/effort theory predicts a 
quantitative difference in how much is 
learned. It is based simply on Mayer’s 
(1975) Condition 1, namely, how much in- 
‘formation is received. 

A contrasting theory is based on the idea 
that note taking encourages learners to ac- 
tively integrate the new information within 
their own past experiences because subjects 
are required to paraphrase, organize, and 
make sense out of the presented material. 
This could be called the generative theory 
and is similar to that described by Ausubel 
(1968) as “meaningful learning” or by Wit- 
trock (1974) as “generative” learning. More 
recently, educators have taken an interest in 
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“elaborative” techniques (Klemt & Ander- 
son, 1973; Lynch & Rohwer, 1971; Royer & 
Kulhavy, 1973; Diekhoff, Note 3) that en- 
courage this process of active integration of 
old knowledge with new. The generative 
theory relies on all three of Mayer’s condi- 
tions for meaningful learning, and it is as- 
sumed that note taking is more likely to en- 
courage subjects to activate relevant past 
experiences (Condition 3) and use them as 
an assimilative set. A weak prediction of 
this theory is that more should be learned 
with note taking because there are more 
cognitive “anchors” to attach new material 
to; unfortunately, this interpretation leads 
to the same predictions as the attention and 
effort theories. However, a strong predic- 
tion of the generative theory is that quali- 
tatively different learning outcomes should 
result, since the note takers are more likely 
to integrate new information with old, and 
non-note takers are more likely to encode the 
information as presented. Thus, the gen- 
erative theory predicts qualitative differ- 
ences in what is learned, as manifested in the 
result that note takers should excel on far- 
transfer problems while non-note takers 

excel on retention and near-transfer prob- 

lems. The main goal of this article is to 

provide information concerning these theo- 

ries of the encoding processes and structures 

underlying the mathemagenic activity of 

note taking. 

Further, previous research (Aiken et al., 
1975; Eisner & Rhode, 1959; Mayer, 1975) of 
Aptitude X Treatment interactions suggests 
that subjects with high abilities may be able 
to activate meaningful learning sets at will, 
thus probably not needing an overt activity 
that helps activate such sets. This idea was 
also explored in the present studies by de- 
termining whether note-taking activity had 
the same effect on the learning outcomes of 
high- and low-ability learners. 


Experiment 1 


The goal of Experiment 1 is to determine 
whether subjects who take notes acquire 
learning outcomes different from those of 
subjects who do not take notes while viewing 
a short lecture. A second question is 
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whether or not providing an outline of the 
material influenced learning. 


Method 


Subjects and design. The subjects were 60 Univer- 
sity of California at Santa Barbara students, who par- 
ticipated in the experiment in order to fulfill a re- 
quirement of their introductory psychology course. 
Five subjects served in each cell of a2 X 3 X 2 factorial 
design, with the first factor being whether or not the 
subject could take notes (notes vs. no-notes groups) and 
the second factor being whether the subject received an 
“advance organizer” before the lesson, after the lesson, 
or not at all; the third factor was whether the subject 
had a Mathematical Scholastic Aptitude Test (MSAT) 
score above or below 550 (high vs. low ability). 

Material and apparatus. A 16-minute lecture con- 

cerning a simplified version of FORTRAN computer 
programming was recorded on Sony videotape. The 
lecture was adapted from a text used by Mayer (1976); 
the lecturer used almost the same words as in the text 
and wrote example statements on the chalkboard. The 
organization of the lecture was a follows: A brief in- 
troduction was given, then each of the seven statements 
was explained, and an example was given foreach. The 
seven statements began with READ, WRITE, EQUALS, 
CALCULATE, GO TO, IF, or STOP. A set of 12 test 
questions were constructed, with each question typed 
onto a 3 X 5 inch (7.5 X 12.5 cm) index card. Half the 
items were generation-type questions, in which a subject 
was asked to write a program (or statement) to solve a 
problem that was stated in English, for example, “Given 
that a card with a number on it is input, write a program 
to print out that number unless it is greater than 5.” 
The other half of the problems were interpretation-type 
questions, which presented a program (or statement) 
and asked the subject to tell how many cards would be 
input, how many numbers would be output, and what 
problem is solved. In addition, for each problem type, 
two items dealt with single statements, two dealt with 
nonlooping programs, and two dealt with looping pro- 
grams. Interpretation-type items were least similar to 
how information was presented in the text and therefore 
represented “far” transfer. 

The advance organizer consisted of a short outline of 
the lesson. Each of the seven computer statements 
along with a one-line explanation of the statement's 
function were given. In addition, there was a subject 
questionnaire that asked about previous experience 
with computer programming and mathematics and that 
consisted of six short algebra substitution problems 
(e.g., Y=X+Y=X+5,X=5, Y= —). Apparatus 
included three 11-inch (27.5 cm) Sony video monitors 
connected to a Sony AV3600 video recorder/player. 

Procedure. In each session, three or fewer subjects 
were assigned (randomly) to one treatment and were 
seated in separate cubicles facing a video monitor 3 feet 
(1 meter) away. The subject questionnaire was ad- 
ministered, and each subject was asked about previous 
programming experience. Instructions to pay close 
attention and be prepared for a short test were read. 
Then, the organizer-before group was given the outline 
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to read, but it was removed before the tape began; the 
organizer-after group was given the same outline afte. 
the tape, but it was removed before the test. Subjects 
in the notes group were given a sheet of paper and told 
it was to their benefit to take notes. All subjects in this 
condition did take notes, and the notes were collected 
before the test. 

The tape was played, and then the 12 question cards 
and an answer sheet were given to each subject. The 
cards were randomized, except for the constraint that 
two problems of each type occurred together, and the 
deck alternated between interpretation and generation 
problems. Subjects were told to work at their own rates 
without feedback, to work on only one problem at a 
time, and to not go back to previous cards. Again, atthe 
end of the session, subjects were asked to indicate prior 
familiarity with the concepts presented. 


Results and Discussion 


Scoring. One subject expressed prior 
familiarity with the programming concepts 
presented in the experiment, and two 
subjects could not correctly answer more 
than three of the six algebra problems on the 
subject questionnaire. Data for these 
subjects were eliminated, and new subjects 
were run in their place. 

Each subjects's response for each test item 
was scored along a 2-point scale. Interpre- 
tation answers, having the number of cards 
correctly read in and out, received 1 point; 
while stating the problem in a sentence (eg 
“Print each number until an 88 appears’) 
was worth 1 point. Generation answers were 
counted as correct if they contained the right 
statements in the right order, even if there 
were format errors, for example, “READ AT 
instead of “READ (A1)," or even if there were 
grammar errors, for example, “GO TO P3 IF 
(A1 = 3)" instead of “IF (A1 = 3) GO TO P3. 
Partial points were given in looping state- 
ments if the program was correct except for 
the order of the statements. 

Patterns of transfer performance. Table 
1 gives the mean proportion correct response 
by type of problem for the two treatment 
groups partitioned into high and low ability. 
‘An analysis of variance was performed on the 
results of Experiment 1, with note taking: 
outline, and ability as between-subjects 
factors (2 X 3 X 2) and problem type and 
problem difficulty as within-subjects factors 
(2 X 3). 

There were no significant main or inter 
active effects for whether a subject T€ y 


Apu 


s 
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ceived the outline before learning, after 
learning, or not at all. Therefore, the Ex- 
periment 1 data given in Table 1 are col- 
lapsed over this variable. The failure to 
obtain an outline effect was probably related 
to the fact that the outline was extremely 
similar to the presented material. Scandura 
and Wells (1967), following Ausubel's (1968) 
work in prose learning, have suggested that 
for an advance organizer to be useful in 
problem solving, it must be “at a high level 
of abstraction, generality, and inclusiveness" 
(p.295). Although the last requirement was 
probably met, the first two were probably 
not met. 

High-math-ability subjects performed 
better overall than low-math-ability sub- 
jects, F(1, 48) 214.26, p < -001, and there 
were no significant interactions with prob- 
lem type or problem difficulty. 

Note taking had no overall effect, F(1, 48) 
= 1.26 (ns), on posttest performance, but it 
did have an effect on the type of question 
that subjects could answer: For Notes X 
Problem Type, F(1, 48) = 4.01, p<.05. The 
interaction supports the observation that 
notes subjects performed better on inter- 
pretative problems, while no-notes subjects 
performed better on generative problems. 
These results fail to support the attention or 
effort predictions that the note-taking 
subjects should outperform the non-note 
takers; instead, the results are most consis- 
tent with the generative theory’s idea that 
note takers acquired broader learning out- 
comes that integrated new material with old 
but lost some of the details. There was also 
a significant three-way interaction involving 
notes, problem type, and ability, F(1, 48) = 
6.91, p < .025. This interaction can be 
summarized by stating that the Notes X 
Problem Type interaction was much 
Stronger for low-ability subjects than high- 
ability subjects. 

Since the effect of note taking was stron- 
Eest for low-ability subjects, a separate 
analysis of variance was performed on the 
data for low-ability subjects. Again, there 
was no main effect for notes, F(1, 28) = 1.43 

(ns), but there was a significant interaction 
for Notes X Problem Type, F(1, 28) = 15.57, 
P < .001. Apparently, note taking helps 
activate a meaningful learning set (Condi- 
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tion 3) for low-ability subjects and encour- 
ages an integrative encoding of the material; 
however, high-ability subjects have devel- 
oped a strategy of using an active, integrative 
encoding regardless of external mathe- 
magenic activities such as note taking. For 
example, for high-ability subjects, the Notes 
X Problem Type interaction is not signifi- 
cant (F < 1). 


Experiment 2 


Method 


Subjects and design. The subjects were 48 Univer- 
sity of California at Santa Barbara students recruited 
from the same pool as in Experiment 1. Six subjects 
served in each cell of a 2 X 2 X 2 factorial design, with 
the first factor being whether or not the subjects could 
take notes (notes vs. no-notes groups), the second factor 
being whether the subjects saw a videotaped lecture 
(video group) or read the same lesson in text form (read 
group), and the third factor being whether the subject 
had a MSAT score above or below 550 (high vs. low. 
ability). 

Materials and apparatus. A 22-minute videotaped 
lecture on the use of the chi-square test was adapted 
from Lovejoy’s (1975) Statistics for Math Haters, The 
organization of the presentation was as follows: brief. 
introduction to the concept of chi-square, a statement 
of the formula and a description of the symbols, an ex- 
ample using a simple 2 X 2 contingency table, a de- 
scription of the solution steps, an explanation of the 
concept of degrees of freedom, an explanation of how 
to use the tables and draw conclusions, and a summary 
of the entire lesson. The words and figures used in the 
lecture were also converted into a written text of 15 
typewritten pages. 

The posttest consisted of six near-transfer and six 
far-transfer problems. Each question was typed on a 
separate 4 X 51% inch (10 X 14 cm) sheet of paper, and 
the 12 questions were stapled to form a booklet. The 
near-transfer questions asked subjects to apply the 
formula to problems similar to those given in the lesson, 
such as computing a value of chi-square for a 2 X 2 or 2 
X 3 table when both observed and expected values of 
each cell were given. Far-transfer questions required 
going beyond the specific information in the lesson; for 
example, one question presented an impossible chi- 
square situation (to which the subject should respond 
that the question cannot be answered), and another 
question asked subjects to state what would happen to 
the value of chi-square in any situation where all the 
observed values in each cell are doubled. 

In addition, the same subject questionnaire, video- 
tape equipment, and partitioned booths were used as 
in Experiment 1. 

Procedure. The procedure corresponded to Exper- 
iment 1, except that the read subjects were given 
booklets and the video subjects watched a tape con- 


cerning chi-square. 
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Results and Discussion 


Scoring. Three subjects expressed prior 
familiarity with the chi-square test pre- 
sented in the experiment, and four subjects 
could not correctly answer more than 50% of 
the algebra pretest items. Data for these 
subjects were eliminated, and new subjects 
were run in their places. 

Fach subject’s response for each of the 12 
test items was scored on a 2-point scale. The 
near-transfer questions were scored as cor- 
rect (2 points) if the subject gave the correct 
chi-square value and degrees of freedom. 
Far-transfer questions were counted as cor- 
rect (2 points) if the subject answered that 
a question was not amenable to a chi-square 
test (where that was the case), or if the 
subject indicated correctly the direction of 
change of chi-square for different situations. 
Partial credit was given for both types of 

questions when only part of the question was 
correctly answered. 

Patterns of transfer performance. An 
analysis of variance was performed with note 
taking, presentation mode, and ability as 
between-subjects factors and problem types 
as a within-subjects factor. 

Table 1 gives the mean proportion correct 
response by type of question for notes and 
no-notes subjects of high and low ability. 


Table 1 
Proportion Correct Response on Transfer 
Posttest by Problem Type for Four Groups for 


Experiments 1 and 2 


Ability level 
an Experiment 1 Experiment 2 
treatment Genera-Interpre- Near Far 
group tive tive transfer transfer 
Low ability 
Notes ,89 56 36 46 
No notes 49 33 46 32 
High ability 
Notes 67 62 63 52 
No notes 60 60 52 AL 


: E 
Note. For Experiment 1: effect of ability, p < .01; Notes X 
Problem Type, p < 025; Notes X Ability X Problem Type p 
«025. ForExperiment2: effect of ability, p < 05; Notes X 
Problem Type, p < .08; Notes X Ability X Problem Type, p < 
.10; Notes X Problem Type (for low ability), p < .05; Notes X 
Problem Type (for high ability), ns. $ 
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An analysis of variance revealed no overall 
effect due to presentation mode (F « 1) and 
no statistically significant interactions in- 
volving this variable. Therefore, the Ex- 
periment 2 data in Table 1 have been col- 
lapsed over this variable. Apparently, note 
taking and ability have similar effects re- 
gardless of presentation mode; similar re- 
sults have been obtained by Koran (1971) 
and by Acheson, Tucker, and Zigler (Note 4). 
Also, as expected, high-ability subjects per- 
formed better overall than low-ability 
subjects, F(1, 40) = 4.62, p € .05. 


^ 


In Experiment 1, note takers performed m 


better on far-transfer and worse on near- 
transfer questions than non-note takers, and 
this-was particularly true of the low-ability 
subjects. The main focus of Experiment 2 
was to determine whether these results coul 
be replicated in a new situation, using dif- 
ferent materials (statistics rather than 
computer programming), à different test 
(computing values or answering judgment 
questions), and a different presentation 
mode (text and video). As in Experiment b 
there was no overall effect for note taking, 
F(1, 40) = 1.36 (ns). However, the Notes X 
Problem Type interaction found in Ex- 
periment 1 was in the same direction in Ex- 
periment 2, but it reached only margina ly 
significant levels, F(1, 40) = 3.45, p € .08. 
There was also the same trend as in Expert 
ment 1, in which the Notes X Problem Type 
interaction was much stronger for low-ability 
subjects than for high-ability subjects: 
however, again, the Notes X Problem TyPe 
X Ability interaction reached only margina! 
significance, F(1,40) = 3.06, p <.10. 

As in Experiment 1, separate analyses of 
variance were performed for each ability 
group. While there was no overall effect for 
notes in either ability group and no Notes X 
Problem Type interaction for the high- 
ability group, there was a significant Notes 
X Problem Type interaction for the low- 
ability group, F(1, 20) = 5.54, p € 05. 
These results are generally consistent with 
those of Experiment 1. 


Experiment 3 


Experiment 3 was intended to more 


closely determine what was learned by the | 


x 
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note takers and non-note takers by analyzing 
the recall protocols of subjects. 


Method 


Subjects and design. The subjects were 40 Univer- 
sity of California at Santa Barbara students, who par- 
ticipated in the experiment in order to fulfill an intro- 
ductory psychology course requirement. Twenty 
subjects served in each group of a two-group design 
(notes or no-notes group). 

Materials and apparatus, The same videotaped 
lecture on computer Programming and videotape ap- 
Paratus as used in Experiment 1 was used in Experi- 
ment 3. Also, the same subject pretest was used asin 
Experiment 1. 

Procedure. The learning phase corresponded to Ex- 
periment 1, except that there were no organizers. After 
the tape, all notes were collected and subjects were 
asked to write down on a new sheet of paper all they had 
learned about certain Statements as if they were 
teaching a new subject who would be coming in for the 
next session. Subjects were asked to recall READ, IF, 
and CALCULATE statements, working at their own rates 
and completing one test before starting another. Or- 
ders of the three recall tests were randomized. 


Results and Discussion 


Scoring. Four subjects could not cor- 
rectly answer all the pretest questions, and 
two subjects did not follow directions on how 
to recall the material. Data for these 
subjects were eliminated, and new subjects 
were run in their places. 

The text for the three recall lessons was 
broken down into 64 idea units, All idea 
units in the protocols were classified as ei- 
ther 1 of the 64 idea units from the text (ei- 
ther as correct or incorrect), as a summary, 
an intrusion, or a connective. A correct idea 
unit could be substituted (with minor mod- 
ification) into the text without altering the 
meaning; an incorrect idea unit dealt with 
the same particular idea but contained a 
factual error, such as saying “The first blank 
after the equals isa number.” An idea unit 
in the protocol was scored as a summary if it 
conveyed information on a general level that. 
was more than a single idea unit alone. For 
example, the following would be counted as 
a summary for CALCULATION statements: 
"Computers are able to perform calculations 
given certain information." An intrusion is 
Similar to a summary, except that it comes 
from one or more idea units that were not in 
the to-be-recalled section of text; for exam- 


ple, if the subject mentioned “The computer 
follows the statements in order” when talk- 
ing about READ statements, that would be 
counted as an intrusion because that infor- 
mation is contained in another section of the 
text. An idea unit in the protocol was 
counted as a connective if it was used to tie 
other idea units together without adding any 
new information; an example is a subject 
saying “And that is basically what is in- 
volved.” 

All scoring of the protocols was done by 
two blind reviewers. Neither reviewer knew 
which group the protocol came from. Dis- 
agreements were resolved by consensus of 
the two reviewers. Disagreements occurred 
on less than approximately 1096 of the idea 
units. 

The 64 idea units from the text were 
grouped into three categories: 14 ideas 
concerning technical symbols such as "the 
names of the address spaces are A1, A2, A3, 
:++” or “An example statement is READ 
(A2),” or “The symbol for greater than is >” 
(technical idea units); 14 ideas concerning 
the format of the statements such as “An 
address name goes in the first blank after the 
equals” or “A statement number follows the 
GO TO” (format idea units); and 36 ideas 
concerning the structure of the computer 
such as “The number in memory space A2 is 
erased" or “The sum is placed in memory 
space A2" (structure idea units). 

Several additional measures were con- 
structed to indicate the organization of the 
recall protocols. The “verbatim index” was 
constructed by determining the number of 
adjacent pairs in the protocol that were also 
adjacent idea units in the original text and 
dividing that by the total number of recalled 
pairs. The “order index” was constructed by 
determining the number of adjacent pairs in 
the protocol where the second member came 
somewhere later in the original text than the 
first member of the pair and dividing that by 
the total number of recalled pairs. The 
“category index” was determined by count- 
ing the number of adjacent pairs of idea units 
in the protocol that came from the same 
section of the lesson and then dividing that 
by the number of pairs recalled minus two. 

In addition, the number of words and 
number of symbols in the protocols were 
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Table 2 ilo 2 [ 
Average Number of Idea Units and Additional Units Present in Recall Protocols for 
Experiment 3 
Idea units? S : 
Treatment Technical Format Structure Additional units - 
group (14) (14) (36) Connectives Summaries Intrusions 
Notes 5.7 5.7 7.2 1.0 4.5 3.9 
No notes 5.7 3.6 47 1.6 5.9 24 
t «1 1.88 2.07 1.88 —1.55 1.30 
p ns <.07 <.05 <.07 ns ns 


Note. For interaction between treatment group and type of unit, p « 025. 
a Numbers in parentheses for idea units indicate total number possible. 


tallied for each subject. A word was any 
English word, and a symbol was any math- 
ematical symbol (such as + or =), any com- 
puter symbol (such as A1, P3, or GO TO), or 
any number (such as 35 or 15). 

What is recalled? One basic question 
addressed in Experiment 3 is whether there 
were any differences in which of the 64 idea 
units were recalled and added (intrusion, 
connective, or summary) by the two treat- 
ment groups. Table 2 shows the average 
number of idea units recalled for each of the 
three main types (technical, format, and 
structure) and also shows the average num- 
ber of connectives, summaries, and intru- 
sions that subjects in the two groups in- 
cluded in their protocols. As can be seen, 
the notes group tended to recall more format 
and structure idea units and to produce more 
intrusion, but the no-notes group tended to 
perform just as well on recalling technical 
units and had more summaries and connec- 
tives. Ananalysis of variance performed on 
these data indicated a statistically reliable 
interaction between note-taking treatment 
and type of protocol item, F(5, 190) = 2.86, 
p <.025. 

These results are consistent with the 
Group X Type of Problem interactions ob- 
tained in Experiments 1 and 2. In those 
studies, the notes group performed partic- 
ularly well on problems requiring far trans- 
fer; in the present experiment, the notes 
subjects seem to remember more about how 
the computer operates and also give some 
evidence that they have attempted to con- 
nect the new information with other ideas (as 


indicated by intrusions). Also, in the pre- 
ceding studies, the no-notes group per- 
formed particularly well on problems that 
require applying the technical information 
that was presented; in the present experi- 
ment, the no-notes subjects seem to have 
concentrated on learning the technical 
symbols and have a much vaguer under- 
standing of the structure of the computer (as 
indicated by high levels of summaries and 
connectives). 
How is recall organized? A second major © 
question addressed in Experiment 3 is 
whether the two treatment groups organize 
the new information in the same way. The 
averages for the verbatim index were .30 for 
notes and .20 for no notes, £(38) = 2.20, p € 
05; for the ordered index, .62 for notes and 
42 for no notes, £(38) = 3.03, p < .01; and for | 
the category index, .93 for notes and .90 for 
no notes (t < 1). These results suggest that 
the notes group was more coherent in its 
pattern of recall. Í 
Can the two groups be discriminated from 
one another? A third question is whether it 
is possible to determine a subject's treatment 
group based only on his or her recall perfor- 
mance, and if so, which are the distin- 
guishing characteristics? In order to help 
answer this question, a stepwise discrimi- 
nant function analysis was performed using 
11 variables as factors. The discriminant 
analysis selected 7 of the 11 factors to be in- 
cluded in the discriminant function; 
factors and their standarized function coef- 
ficients (indicated in parentheses) are Show? 
below. High scores on structure (—.9 d 3 


5) ani 
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format (—.51) idea units, intrusions (—.49), 
and ordered recall (—.54) loaded onto the 
notes centroid; while high scores in recalling 
technical ideas (+.58), connectives (+.41), 
and verbatim recall (+.55) loaded onto the 
no-notes centroid. This function was able 
to correctly assign 90% of the cases to groups. 
The four nonselected factors were category 
index, summaries, words, and symbols. 

A validation analysis was performed in 
order to test the generality of obtained 
discriminant function. Fourteen subjects 
from each group were randomly selected and 
used to produce a discriminant function. 


function; in addition, the function correctly 
predicted the &roup membership of 83% of 
the remaining 12 Subjects, even though the 
data from those subjects were not used to 
develop the function. 


General Discussion 


These results are consistent in some re- 
Spects with previous findings, including 
those reporting that note taking did not re- 
sult in better performance than not taking 
notes. If we had looked only at measures of 
retention and near transfer in Experiments 
1 and 2, as has traditionally been done, we 
too would have found no evidence for supe- 
rior performance due to note taking. Fur- 
ther, if we had looked only at the overall 
amount of recall in Experiment 3, as has 
traditionally been done, we again would have 
found no Strong effect due to note taking. 

However, when we attempted to assess the 
learning outcomes of note takers versus 
non-note takers, important differences 
emerged. In Experiments 1 and 2, there was 
a consistent pattern of Treatment X Posttest 
interaction, especially for low-ability 
subjects, in which note takers performed 
worse on near transfer but better on far 
transfer than non-note takers. Experiment 
3 provided complimentary evidence: There 
Was a corresponding Treatment X Recall 
interaction in which note takers recalled 
information most likely to support far 
transfer (conceptual idea units and intru- 
Sions), while non-note takers excelled on 
technical idea units (useful for near transfer) 
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and included more vague summaries and 
connectives. 

These results provide consistent; Support 
for the predictions of the strict version ofa 
generative theory rather than attention or 
effort theories, 


formation with past experience to form a 
broader learning outcome, while non-note 
takers were more likely to add new facts to 
memory and form narrower outcomes. The 
view of the note taker provided in this Study 
is one of an active learner, trying to see the 
interconnections in what he or she is learning 
and trying to see how it all relates to some 
underlying system (eg. 
computer for the computer programming 
lesson). The non-note taker, on the other 
hand, is simply trying to catch the main 
points and is particularly interested in 
learning the fragments of the technical 
symbols and examples that will allow ac- 
ceptable performance on a later test. 

Certainly, the current set of experiments 
Tepresents just one test of the predictions, 
and more work is required to determine the 
limits of each theory posed above. In the 
present studies, we used unfamiliar material; 
different results might be expected with 
more familiar materials. For example, 
subjects might “automatically” use an assi- 
milative encoding strategy, thus eliminating 
any effect for note taking. Also, in the 
present study, our effects were especially 
strong for low-ability subjects; again, this 
implies that our high-ability subjects in 
Experiments 1 and 2 might have been more 
likely to automatically use assimilative en- 
coding strategies regardless of instructional 
manipulation. A tentative hypothesis is 
that when the material is unfamiliar and 
subjects have not developed efficient en- 
coding strategies, mathemagenic activities 
aimed at encouraging assimilative rather 
than additive encoding are likely to be most 
successful. 
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* difficulty. Cognitive ability interacted with i 
the group scoring high on the Scholastic Aptitude Test performed better than 
the low scorers when the instruction entailed more errors and less time on 
task. Qualifications, implications, and the need for considering relevant 
J learner characteristics when prescribing optimal instruction are discussed. 
The instructional difficulty level that sponding should be minimized, that is, kept 
best facilitates learning has been examined in the neighborhood of no more than about 
in a number of different contexts. Effects 5% of all responses (e.g., see Holland, 1967; 
of higher-order and lower-order teacher Skinner, 1961). Others have presented and 
questions in classrooms have been studied, discussed data that indicate that learning 
and error rates have been manipulated and conditions that entail less redundancy and 
„ œ tested for effectiveness in enhancing learning more challenge (and hence a relatively high 
in programmed instruction tasks. Resultant error rate) are more likely to aid the learner 
findings have not indicated that a single best in acquiring new information (e.g., Bruner, 
i difficulty level exists for optimally promot- 1960; Pressey & Kinzer, 1964; Turner & 
ing the acquisition of information for all Durrett, Note 1). 
types of learners in all situations. Beha- The present study addresses the issue of 
viorally oriented researchers have consis- the optimal difficulty level of instruction by 
tently maintained that errors in student re- using programmed materials to vary the re- 
dundancy (difficulty level) of the instruction. 
This article presents a portion of the findingsofthe Some individuals were assigned randomly to 
author’s doctoral dissertation submitted to the Uni- a low-error-rate or low-difficulty-level 
M pty of m Resp ues Er ven pie treatment condition, while others were as- 
sented at tl eeting of the American Education. : d H i iffi 
il SNR Ct York, April 1977. The re- signed either to 3 RO dauid a hi gh-diffi- 
search described in this article was supported in part by culty -level treatment. y using Dior 
a National Institute of Education Contract NIE-C- grammed instruction in presenting the in- 
74-0089, Correlates of Effective Teaching (University formation to be learned, it was possible to 
r of Texas at Austin). The opinions expressed herein do equate stimuli input to the subjects and vary 
not necessarily reflect the position or policy of the Na- ly the instructi ‘onal components whose 
tional Institute of Education, and no official endorse- Only ` 
ment by that office should be inferred. effects were being tested. 
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level may be that there is no single range of 
errors that is optimal for different types of 
learners in different situations. Cronbach 
(1957, 1975), Hunt (1975), and others have 
argued that relevant learner characteristics 
should be considered before an attempt is, 
made to prescribe optimal instruction. This 
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view has led to a number of studies con- 
ducted in recent years that have increasingly 
lent support to the notion that certain 
learner characteristics may, in fact, signifi- 
cantly moderate the effects of a number of 
instructional treatments (see Cronbach & 
Snow, 1977). 

Correlational data from the Texas 

Teacher Effectiveness Project discussed in 
Brophy and Evertson (1976) have suggested 
that a learner attribute variable such as 80- 
cioeconomic status (SES) level can be an 
important and significant moderator of the 
effectiveness of the difficulty level of ques- 
tions asked by teachers during academic 
discussions. The contrasting process- 
product correlations showed that the opti- 
mal level of correct responses varied with 
school SES such that the error rate at which 
lower SES pupils learn best was lower than 
the error rate that allowed the higher SES 
pupils to make the largest achievement 
gains. The lower SES students apparently 
performed better with easier questions, while 
the higher SES students learned more when 
challenged with difficult questions, which 
caused a higher number of incorrect re- 
sponses. However, since these data were 
correlational, and since the measure of stu- 
dent characteristics was only a broad proxy 
variable like SES, these data only suggest 
hypotheses—hypotheses that are being 
tested experimentally in this study. Also, 
SES level was interpreted by Brophy and 
Evertson as standing for a number of at- 
tributes, such as anxiety and general ability, 
which were thought to be the actual moder- 
ators of the effectiveness of the varying dif- 
ficulty level of the teacher's instruction. 
"These correlational data and the a¢cumu- 
lating evidence supporting the aptitude- 
treatment interaction approach to investi- 
gating the effectiveness of different learning 
strategies led to the formulation of the 
present study. 

The Atkinson and Feather (1966) theory 
of achievement motivation yields specific 
predictions concerning optimal difficulty 
levels for learners with varying levels of 
anxiety and achievement motivation. The 
context and predictions of the theory fit with 
the Brophy and Evertson (1976) correla- 
tional data, since they discussed the differ- 
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entiated SES findings as being caused by 
different levels of anxiety and motivation (in 
addition to other factors) in the two SES 
subgroups. 

Briefly, the Atkinson-Feather theory 
predicts that individuals with a strong ten- 
dency to approach success (high need for 
achievement) and low fear of failure (low 
anxiety) would perform increasingly better 
as the difficulty level (error rate, or proba- 
bility of failure) of the instruction ap- 
proaches 50% but then decline in perfor- 
mance as the difficulty level becomes greater 
than 5096. Conversely, individuals with a 
tendency to avoid failure (anxiety) that is 
greater than their tendency to approach 
success (motivation) should show an oppo- 
site pattern, that is, they would be predicted 
as showing a decreasing performance curve 
as the difficulty increases from 0 to 5096 and 
then predicted as showing an increase in 
performance as the difficulty level goes from 
5096 to higher levels. These two predicted 
performance curves, when plotted over the 
full range of possible errors (from 0 to 10096), 
would show a curve for high-need-achieve- 
ment, low-fear-of-failure individuals ap- 
proximating an inverted U shape, while 
low-need-achievement, highly anxious in- 
dividuals' performance curves would be U 
shaped. 

The rationale behind these predictions is 
that the difficulty levels around 5096 would 
be best for individuals with high need 
achievement, since this is neither too easy 
and boring nor overly difficult and frus- 
trating. But, for individuals with a strong 
fear of failure and low need achievement, 
instruction that is either very low or very 
high in difficulty is predicted as being opti- 
mal— presumably because there is either aD. 
actual low probability of failure or because 
these individuals would attribute failure at 
levels above 5096 to the degree of the diffi- 
culty of the task, thereby minimizing P0- 
tentially interfering anxiety responses DY 
escaping culpability for failure. 

Considering these theoretical predictions 
and the correlational empirica base 
suggesting varying difficulty levels, data 
were collected on affective characteristics 
learners—need achievement an fear O 
failure (test anxiety)—and also on general 
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+ cognitive ability. It was felt that analyses 
employing these measures would yield 
valuable information on the efficacy of the 
Atkinson-Feather theoretical predictions, 
the Brophy and Evertson correlational 
findings, and the general usefulness of the 
aptitude-treatment interaction approach to 
the investigation of instructional effective- 
ness. The difficulty level of the instruction 
was manipulated while holding constant the 
semantic content of the instructional treat- 
ments. As will be indicated in the remain- 
der of this article, the results did show a 
number of statistically reliable aptitude— 
treatment interactions and also provided an 
empirical validation of the Atkinson- 
Feather theoretical predictions. 


Method 


Subjects 


Subjects were 152 college students in a preservice 
teacher education program at a major state university. 
They were randomly assigned to treatment groups by 
assembling the experimental packets in random order 
before passing them out to subjects. Fourteen subjects’ 
data were eliminated from the sample because they did 
not complete the experimental materials or failed to 
follow directions correctly. It was necessary to exclude 
male participants from all but the initial main effects 
analyses of sex differences because these analyses in- 
dicated that scores for males and for females differed 
significantly on several measures: prior coursework in 
areas relating to the content of the experimental ma- 
terials, a subjective rating of familiarity with the content 
of the materials, and the posttest score measuring in- 
formation acquired in the experimental treatments 
(regardless of treatment). Ideally, subject sex would 
have been used in all analyses as an additional experi- 
mental factor, but the number of males was so low (25) 
that this was not possible. The maximum ns for the 
difficulty-level analyses were the following: 22 in the 
low-error-rate treatment (high redundancy), 24 in the 
medium-error-rate treatment, and 24 in the high- 
error-rate condition (low redundancy). Other subjects 
were assigned to read-only treatments and did not enter 
into the analyses of instructional difficulty effects. 
Missing attribute data (primarily Scholastic Aptitude 
Test scores) and/or the selection of specified groups for 
aptitude-treatment interaction hypothesis testing 
lowered the ns further in some instances. 


Materials 


A 30-item, multiple-choice (5 options per each item) 
pretest-posttest was used to assess the amount of 
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learning resultant from the teaching materials. This 
test covered the subject matter presented in the pro- 
grammed materials. The test items and the content of 
the teaching materials were derived from a standard 
college physiological psychology textbook (Morgan, 
1965). Physiological psychology was chosen as the 
content of the experimental treatments and the test 
because most subjects were not expected to have been 
exposed to this area extensively (pretest scores sup- 
ported this assumption). 

Programmed teaching materials formed the different 
difficulty level treatments employed in the experiment. 
The materials were designed by the author specifically 
for use in this investigation. All three sets of materials 
used in testing difficulty level hypotheses were con- 
structed response mode (fill in the blank). These dif- 
ferent sets of materials covered equal amounts of 
subject matter and had equal numbers of frames but 
differed in the average error rate within frames and 
across each set of materials. Previous research with 
these materials had established the error rates at about 
4% for the low-error-rate group, 15% to 18% for the 
medium-error-rate group, and 35% to 40% for the 
high-error-rate group. The average error rates of the 
materials used with the present sample of subjects were 
close to these earlier established levels: 6.9%, 21.9%, 
and 40.09, respectively, for the three levels of difficulty. 
These difficulty indices should be stable, since the fig- 
ures come from different studies using different samples 
of college students. (Data pertaining to the individual 
error rate distribution in each of these three treatment 
groups are presented in my more inclusive technical 
report [see Crawford, Note 2].) 

Each of the sets of experimental materials consisted 
of 40 fill-in-the-blank frames, 10 frames coming from 
each of four different content areas: the central ner- 
vous system, neuronal physiology, the internal envi- 
ronment, and the physiological basis of emotion. The 
answers to each frame were given in the margin next to 
the subsequent frame on the following page. 'This 
format allowed the subject to fill in the blank on one 
frame, turn to the next page, check his or her answer for 
correctness, and then proceed to the next frame. The 
backs of the sheets had a strip of paper taped to them 
to keep the subjects from seeing the answers through the 
sheet. Subjects were instructed not to change their 
answers if they were wrong. 

Attention was given to construction procedures de- 
scribed in O'Day, Kulhavy, Anderson, and Malczynski 
(1971) in designing the materials. The difficulty level 
of the frames was manipulated by leaving out letters in 
partial presentations of responses, by omitting clueing 
phrases that act as aids for the subjects in constructing 
the response, and/or by reducing the redundancy of the 
wording of the frames to make the answers less obvious, 
The required responses in parallel frames in the dif- 
ferent sets of materials were always identical, 

Data on learner characteristics were derived from 
three instruments: (a) the Test Anxiety Scale (TAS), 
which is a more recent version of the Test Anxiety 
Questionnaire (Mandler & Sarason, 1952); (b) the 
Achievement Motivation scale from the Jackson (1967) 
Personality Research Form (PRF); (c) and the Verbal 
and Quantitative subscales and total score from the 
Scholastic Aptitude Test (SAT). 
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The TAS was used in measuring fear of failure (re- 
ferred to in the motivation literature as the “motive to 
avoid failure”). This is a self-report measure (87 
true-false items) of how anxious the subject says he or 
she is in various testing situations. Sarason (1972) has 
reported the test-retest reliability of the TAS as ranging 
between .75 and .80. 

The Achievement Motivation scale of the PRF has 
been shown to have high internal consistency and high 
predictive validity. Clarke (1973) maintains that the 
PRF scales are the best measures of achievement and 
affiliation motivation, and that the measures are rela- 
tively free from response set effects like social desir- 

ability and acquiescence. Only the achievement mo- 
tivation items were selected for use in this experiment; 
there were 20 true-false items, 10 scored as true and 10 
scored as false. 

Since both the TAS measure and the Achievement 
Motivation scale of the PRF were composed of true- 
false items, they were combined (alternating items as 
much as possible) into a 57-item questionnaire, which 
was administered to subjects prior to pretests or any 
instructional treatments. For the constructed response 
subsample (N = 70), the correlation between anxiety 
and achievement motivation was —.11 (ns). The cor- 
relation of these measures with the posttest score was 

—.09 (ns) for motivation and —.23 (p « .05) for anxiety. 

Consent forms given to subjects asked for permission 
to have access to grade point average and SAT data on 

| file at the university. Copies of these signed forms were 
delivered to the registrar's office, which then supplied 
the required information. "There was a fairly high rate 
of missing data on these ability measures for students 
who had recently transferred to the university. 


Procedure 


The experiment was conducted at the same time of 
day in the same room over a period of 2 weeks; subjects 
were run in groups of 8 to 18. The affective measures 
were first administered to subjects; upon completion of 
the TAS and PRF measures, a 30-item pretest was ad- 
ministered. The content of the pretest was either the 
same as the posttest or was concerned with educational 

psychology. Different forms of the pretest were used 
to allow a test of familiarity effects of a pretest that was 
the same in content as the instructional materials and 
the posttest (this was in connection with another ex- 
perimental factor not discussed in this article). The 
familiarized group received the physiological psychology 
pretest, which was identical to the posttest, and the 
unfamiliarized subjects were given the educational 
psychology pretest. After the pretest was administered, 
all subjects proceeded through their set of programmed 
instruction materials. Subjects were instructed to turn 
in their packet of materials as soon as they finished; also, 
each subject’s time spent with the teaching materials 
was recorded to the nearest minute as he or she turned 
inthe packet. This time-on-task measure was used in 
some regression analyses as a continuous predictor. 
When all subjects had completed the experimental 
task and the posttest (given immediately upon com- 
pletion of the programmed instruction materials), the 
investigator described in detail the purpose of the study, 
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some of the hypotheses and possible results, and also 
told the subjects where and when more detailed feed- 
back would be available. A biographical information 
sheet that was attached to the back of the posttest re- 
quired subjects to record their social security number 
and gender. As mentioned earlier, there was also a 
relevant-coursework checklist with listings for physi- 
ology, physiological psychology, introductory psychol- 
ogy, nursing, and other courses that might have in- 
fluenced the subjects’ prior level of knowledge in the 
area of physiological psychology. The checklist of 
courses was used for eliminating any subjects’ data that 
may have been confounded by previous coursework 
(those who checked more than two items were elimi- 
nated from the sample along with the 25 males). 
Scores on the physiological psychology pretest were 
computed and found to average less than 4 points more 4 
than what would be expected from chance guessing (9.7 
out of a possible 30, when 6.0 would be expected by 
chance). Since not all subjects were administered the 
same pretest, those data were not used as covariates in — 
adjusting the posttest scores. Analysis of variance of 
only the posttest score was appropriate, since random 
assignment had produced initially equivalent treatment 
groups. 
For some analyses, median (extreme group) splits — 
were made within the treatment groups on the attribute — 
variables. The difficulty level of the instruction was — 
examined in two ways: by using the three treatment — 
groups as different levels of the difficulty factor and by - 
looking at individual subjects' error rates as à continu- — 
ous treatment predictor in conjunction with high or low. 
group membership terms for the attribute measures Mi 
the regression analyses. 4 


Results 


While there was no tendency for a main 
effect for the difficulty level of the instruc- 
tional treatment (F < 1.0), a number 0 
analyses showed statistically significant In- 
teractions with affective and cognitive 
learner characteristics. "Table 1 gives source 4 
data for the Difficulty Level X Affective 
Characteristics interactions. Table 2 pte 
sents relevant means and standard devia- 
tions for the dependent variable. his 
analysis, a contrasting pattern group com- 
parison, was made as a test of the predictions 
of the Atkinson-Feather (1966) theory 0j 
optimal difficulty level. First, subjects 
scores were plotted using the need achieve- 
ment and the anxiety scales in a pivariate 
scattergram. Then, median-split lines Were 
drawn perpendicular to each axis at the ap- 
propriate locations. i 


Contrasting pattern 
learner types came from the two quadrants: 
defined by low scores on one scale an 
scores on the other. Then, posttest me 


- Note. MS = motive to achieve success, 
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Table 1 


Analysis of Variance of Low-Need-Achievement, 
-Failure Subject: 


Need- Achievement, Low-Fear-o, 


Source SS 
Extreme groups (A) 5.8882 
Difficulty (B) 2.7548 
AXB 95.6796 
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High-Fear-of-Failure Subjects and High- 
s by the Difficult: Level o, 


the Instruction 


df MS F p 
1 5.8882 .3682 :5554 
2 1.8774 0861 9173 
2 47.8398 2.9913 0644 


Within 463.8000 29 15.9931 
Note, The dependent variable is the number of correct responses on the posttest. 


of low-need-achievement, high-fear-of- 
failure subjects and high-need-achievement, 
low-fear-of-failure subjects were entered into 
analyses of variance according to the diffi- 
culty level of the constructed response group 
to which they were assigned. 

The interaction between difficulty level 
and fear of failure/need achievement ap- 
proaches significance, with a p value of .064 
(10.996 of the posttest variance controlled by 
the interaction term). The low-need- 
achievement, high-fear-of-failure subjects 
did best in the low-error-rate condition and 
worst in the 4096 difficulty treatment, while 
the high-need-achievement, low-fear-of- 
failure subjects performed optimally with 
the low-redundancy, 40%-error-rate mate- 
rials and did worst if assigned to the group 
receiving the 6,9%-error-rate materials. The 
only treatment that did not show a separa- 
tion on the order of one standard deviation 
was the 21.9%-error-rate group—where the 
high-need-achievement, low-fear-of-failure 
group was only slightly superior to the low- 
need-achievement, high-fear-of-failure 
group of subjects. A plot of the means for 


these two contrasting pattern groups shows 
the relation to the Atkinson-Feather theo- 
retical predictions for increasing difficulty 
levels; the data did conform to the theoreti- 
cal predictions for the range of the theory 
that was tested (i.e., for difficulty levels 
below 5096). 

Difficulty level was also examined by using 
subjects' individual error rates as a contin- 
uous predictor variable. These regression 
analyses supported the analyses of variance 
presented in Table 1 by indicating that the 
two contrasting pattern subject groups in 
low- and high-difficulty treatments had 
significantly (p = .05) nonhomogeneous re- 
gression slopes. The low-need-achievement, 
high-fear-of-failure subjects performed in- 
creasingly worse on the posttest with an in- 
crease in error rate than did the high-need- 
achievement, low-fear-of-failure subjects. 
Apparently, the redundancy or difficulty 
level of the instruction and the individual’s 
need achievement and fear of failure are in- 
fluential in determining the effect of in- 
struction on the posttest measure of learn- 


ing. 


Table 2 A 1 
Cell Statistics for the Affective Characteristics X Difficulty Analysis of Variance 
Subject Average 


group n M SD % errors 
Low MS - high MAF 22.2000 5.1672 6.9 


5 
Low MS - high MAF 6 19.5000 3.6194 21.9 
Low MS - high MAF 6 18.6667 4.2740 40.0 
High MS - low MAF 6 18.6667 3.8816 6.9 
High MS - low MAF 6 21.0000 4.0497 21.9 
High MS- low MAF 6 23.1667 2.9269 40.0 


or need achievement; MAF = motive to avoid failure, or fear of failure, 
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Table 3 

Summary and Regression Statistics for 
Extreme Level Groups as Determined by the 
Total Score on the Scholastic Aptitude Test 
(SAT) by Time on Task and Error Rate (the 
Two Predictors of the Posttest Score 
Criterion) 


LowSAT HighSAT 


(total (total 
Statistic score) score) 
N 20 21 
Means 
Posttest score 19.9500 22.5714 
Time on task 19.8500 20.7619 
Error rate 7.5500 9.0000 
Sigmas 
Posttest score 4.1484 3.3177 
Time on task 4.2223 4.5660. 
Error rate 6.7637 5.6904 
Correlations 
Time on task/posttest 
score 5184 —.4280 
Error rate/posttest score —.6141 —.2018 
Time on task/error rate —.1407 1228 


Note. The regression equation for the low SAT group is Y = 
13.0386 + .4956X + —.3876Z. The regression equation for the 
high SAT group is Y = 29.5420 + —.2974X + —0883Z. For the 
test of homogeneity of group regressions, F(2, 35) = 7.2354, p 
= 0027. Considering only time on task, for the test of homo- 
geneity of group regressions, FQ, 35) = 11.2224, p = .0023. 
Considering only error rate, for the test of homogeneity of group 
regressions, F(1, 35) = 3.1739, p = .0800. The region of sig- 
nificance is a hyperbola with equation (X 2 + —59.5108) + (Z? 
+ 15.1925) = 1. 


The measures of cognitive ability that 
showed interactions with the difficulty level 
of instruction were SAT total score and SAT 
Verbal subscale score. For some analyses, 
subjects were grouped into low- and high- 
median-split SAT groups. The use of low 
versus high SAT (total score) groups in re- 
gression analyses with two continuous pre- 
dictor variables showed a significant inter- 
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action. With individual error rate and time 
on task as predictors, regression planes of ^ 
low and high SAT groups of subjects were 
significantly nonhomogeneously sloped (p 
—.003). Step-down analyses for each pre- 
dictor separately showed that both individ- 
ual error rate and time on task were re- 
sponsible for the interaction: The time- 
on-task interaction was highly significant 
with a p value of .002, and error rate showed 
ap value for the interaction of .08. Table3 
gives the regression statistics, means, and 
standard deviations for these analyses. 

A two-dimensional representation of the 
regions of significance for this analysis (i.e., 
areas on the two-dimensional space defined 
by the two predictors where the groups were 
significantly different) indicates the condi- 
tions under which the low and high SAT 
groups performed best. Examination of the 
regions of significance suggests that there 
was actually no condition in the present 
study in which the low SAT group was su- 
perior. However, the extrapolated region 
(which has only one data point within it) was 
defined by more time on task and fewer in- 
structional errors. The region where the 
high SAT group was superior was defined 
primarily by less time on task and, to some 
degree, by more errors on the teaching ma- 
terials. That is, the high SAT group's curve 
for the superiority region had a lower pro- 
jection on time on task and a higher projec- 
tion on the error-rate axis. This confirms 
other analyses that showed high-ability 
subjects performing better under high- 
error-rate conditions. 

For example, extreme group comparisons 
were made according to the difficulty level 
of the instructional treatment by SAT Ver- 


Analysis of Variance of Scholastic Aptitude Test (SAT) Verbal Sı us 
High) by the Difficulty Level of the Instructional pud e eee TN 


Source SS 


df 


MS F P 
au Verbal (A) 130.7319 1 130.7319 9.8015 0038 
pace (B) 19.6728 2 9.8364 1375 4898 
147.6460 2 73.8230 5.5348 0083 
Within 466.8274 35 13.3379 


Note. The dependent variable is the number of correct responses on the posttest. 
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Table 5 
" Cell Statistics for the Ability (SAT Verbal Subscale) x Difficulty Analysis of Variance 
Subject group M SD 


Low SAT Verbal — low difficulty 

Low SAT Verbal — medium difficulty 
Low SAT Verbal — high difficulty 
High SAT Verbal — low difficulty 
High SAT Verbal — medium difficulty 
High SAT Verbal — high difficulty 


Note. SAT = Scholastic Aptitude Test. 


bal scores. These analyses indicated that 
group assignment to high or low SAT Verbal 
groups interacted significantly with the 
difficulty level. The high SAT Verbal group 
did increasingly better with an increase of 
difficulty (i.e., decrease in redundancy) of 
the instruction, while the low SAT Verbal 
Subjects did worst on the posttest if in the 
high-difficulty treatment and best if as- 
signed to the low-difficulty, highly redun- 
dant treatment. The means and source data 
for these analyses are presented in Tables 4 
and 5. These data strongly suggest that 
treatment difficulty and posttest score are 
related in a monotonically increasing fashion 
for subjects with high verbal ability, while 
the relationship is a consistently decreasing 
one for low-verbal-ability subjects. As ex- 
pected, these analyses also showed a signif- 
icant main effect for verbal ability, with the 
high-ability group scoring higher on the 
Posttest regardless of treatment, 


X An analysis that used SAT Verbal scores 


as a continuous aptitude-predictor also 

. confirmed these analyses. When subjects 
Were grouped by the difficulty level of the 
treatment, there was a trend (p = .09) for the 
low-difficulty, high-redundancy group and 
the highly difficult, low-redundant treat- 
ments to have nonhomogeneous regressions 
of posttest scores onto SAT Verbal Scores. 
he regression line of subjects in the low- 
difficulty condition was basically flat; for this 
group, posttest score was not strongly related 
to verbal ability. However, the highly dif- 
ficult, low-redundancy instructional treat- 
ment showed a positively sloped regression 
line: Predicted posttest scores increased 

* from 14.4 to 26.9, with a corresponding in- 
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n 

6 22.6667 2.4221 
8 19.3750 4.5336 
6 16.3333 5.2789 
7 21.7143 4.1115 
7 22.7143 2.2147 
if 24.7143 1.9760 


crease in SAT Verbal scores from 312 to 
660. 


These interactions of SAT total and SAT 
Verbal scores with the difficulty level of the 
instruction (considered either at the level- 
of-difficulty treatment or individual error 
rate) suggests optimal congruences between 
instructional treatments and learner apt- 
itudes along lines similar to those suggested 
by the Brophy and Evertson (1976) correla- 
tional data, which used SES as a student 
characteristics measure. The agreement 
between these two sets of results seems 
noteworthy, since the samples and methods 
were quite different, and cognitive ability is 
only one of the variables for which SES isa 
proxy. The analyses of SAT Verbal and 
SAT total scores seem to support the idea 
that the learner's present ability should be 
considered in prescriptions of optimal in- 
Struction. 


Discussion 


While the applicability of these findings 
should, strictly speaking, be restricted to 
college students (actually, female college 
students in a preservice teacher education 
program) working with highly structured 
programmed materials, the agreement with 
theoretical predictions and results from 
other studies suggests a broader generaliza- 
bility of results. The test of the Atkinson- 
Feather theoretical predictions in the 
present experiment does indicate support for 
the theory (and also replicates the Koehler, 
1973, study). The major qualification con- 
cerning these findings is that even the highly 
difficult teaching materials elicited, on the 
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average, an error rate of only 40%—a rate 
below the midpoint (50%) of the range with 
which the theory is concerned. Therefore, 
these data do not bear upon the accuracy of 
theoretical predictions beyond the 50% dif- 
ficulty level, where the probability of success 
is less than the probability of failure. 

It may be that for a cognitive task of this 
type, the high-fear-of-failure, low-need- 
achievement group of subjects would not 
increase in performance with an increase in 
difficulty level beyond 50%. Still, the sup- 
port for the Atkinson-Feather theory is of 
some note, since many previous investiga- 
tions of this theory were concerned with 
preference instead of performance as an 
outcome or used noncognitive tasks, such as 
ringtoss or other skill games. The present 
findings suggest that the theory of achieve- 
ment motivation can be generalized to cog- 
nitive, academically oriented tasks as well, 
at least for the difficulty range below 5096 
that was tested. 

The other analyses showed that general 
and verbal ability can be significant moder- 
ators of the effectiveness of the difficulty 
level of the instruction (and, in one instance, 
showed an interaction with the time-on-task 
measure that was taken). Since substantial 
portions of variance are common to measures 
of SES and cognitive ability (generally in the 
range of 1096 to 1596), the interactions of SAT 
scores with the difficulty of the instruction 
may be seen, in some sense, as supporting the 
Brophy and Evertson (1976) correlational 
findings concerning pupil success rates in 
responding. It does make sense that learn- 
ers with richer cognitive structures would 

‚learn optimally under less redundant (i.e., 
more difficult) conditions. However, for less 
able subjects, the instruction is probably best 
if it proceeds in smaller steps and presents 
the information to be learned in a more re- 
dundant format. 

These findings suggest the appropriate- 
ness of matching the instruction to the 
learner by providing him or her with the 
necessary amount of "instructional si » 

(Tobias, 1976) or “environmental AI XE 
(Hunt, 1975). That is, learners with a higher 
level of prior knowledge or a higher concep- 
tual level are capable of learning best under 
conditions of less support and/or structure 
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(in the present experiment, less redundan- 


cy). And, the data concerning fear of failure ` 


and need achievement indicate that certain 
affective characteristics should be consid- 
ered (in conjunction with cognitive ability 
levels) in designing instruction that mini- 
mizes the generation of self-oriented re- 
sponses not relevant to the learning task and 
therefore likely to interfere with productive 
learning. 

Empirical support was shown for Salo- 
mon’s (1972) conceptualization of heuristic 
models for aptitude-treatment interactions. 
Particularly, support is indicated for the 
remediation and compensation models. 
That is, to the extent that low verbal levels 
constitute task-specific attributes that may 
be remediated through a highly redundant 
instructional treatment, and high fear of 
failure is an aptitude that requires com- 
pensation in the form of easy-to-answer 
questions, data from this study may be said 
to support two of the three conceptualiza- 
tions of aptitude-treatment interaction 
heuristic models proposed by Salomon. 

The accumulation of data suggesting that 
learner characteristics do significantly 
moderate the effectiveness of some instruc 
tional treatments should have important 
implications for instructional design mM 
general and individualized instruction M 
particular. As a relatively cohesive apti- 
tude-treatment interaction literature be- 
comes available, research-based specification 
of how instruction should be individualized 
should become feasible. 
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This study investigated the independent effects of level of reward (recognition 
based on the performance of a four- to five-member cooperative learning team 
vs. recognition based on individual performance) and comparison of student 
quiz scores (comparison with ability-homogeneous groups vs. comparison with 
entire class) on student achievement and attitudes. The subjects were 205 
seventh-grade students in eight intact English classes. The experiment used 
a 2 X 2 (Reward Level X Comparison Group) factorial design, in which stu- 
dents studied grammar and punctuation for 10 weeks. Results indicated re- 
ward level effects in favor of team reward and comparison group effects in 
favor of the comparison with equals on percentage of time on task, positive in- 
terpersonal perceptions, and other variables. No academic achievement ef- 


fects were found for either factor. 


In the past few years, there has been in- 
creasing attention in the educational psy- 
chology literature to the use of student 
learning teams in the classroom. "These are 
instructional methods in which students 
work in small groups and are rewarded based 
on the success of the group in teaching its 
members academic material. Team tech- 
niques have had positive effects as compared 
to control treatments on academic perfor- 
mance (Lucker, Rosenfield, Sikes, & Aron- 
son, 1976; DeVries & Slavin, in press; Slavin, 
Note 1, Note 2), mutual concern (Aronson et 
al., 1975; DeVries & Slavin, in press), self- 
esteem (Aronson et al., 1975; Blaney et al., 
1977), and increased interracial friendship 
(DeVries, Edwards, & Slavin, 1978; Slavin, 
1977b). 

Two of the most successful student team 
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techniques have been those developed at 
Johns Hopkins University: Teams- 
Games-Tournament, or TGT (DeVries & 
Slavin, in press), and Student Teams - | 
Achievement Divisions, or STAD (Slavin, in 
press; Slavin, Note 2). Both of these tech- 
niques involve procedures that equalize the 
probability of success given maximum effort 
for students of all levels of past achievement. 
In TGT, this procedure is part of a tourna- 
ment in which students compete as indi- 
viduals to contribute points to their team 
scores. The highest three students in past 
performance compete with each other, the 
next three compete with each other, and so 
on. A standard number of points are 
awarded to the first-place winner in each of 
these three-person competitions, so each 
student has the same chance (about one m 
three) of contributing the maximum score t0 
his or her team. This is in contrast to the 
situation in traditional classes, where some 
students cannot hope to make As or Bs nO | 
matter how hard they try, while others can. 
hardly avoid making high grades. 

The same function is carried out in STAD 
by an achievement division system, in whi 
student scores on twice-weekly quizzes are 
compared to the scores of others of similar | 
past performance. The highest ranking 
score among that group of equals earns | 
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maximum points, again regardless of 
whether the comparison group is one of high 
or low past achievement. 

Thus, both TGT and STAD contain a 
team component and a comparison-among- 
equals component. This study was designed 
to separate the effects of these two compo- 
nents on academic performance, mutual 
attraction, and student attitudes. This 
comparison is accomplished by means of a 2 
X 2 factorial design, where one factor is team 
reward versus individual reward, and the 
other is comparison with equals versus 
comparison with whole class. The com- 
parison-with-equals component is opera- 
tionalized as the achievement divisions of 
STAD. 

A long tradition of research has estab- 
lished the effect of team rewards on mutual 
attraction, positive attitudes toward the 
group task, and group member support for 
group goals (see Johnson & Johnson, 1974; 
Slavin, 1977a). In addition, when group 
members are individually accountable for 
their performance, team reward systems 
usually increase performance (Slavin, 
1977a). 

In this study, positive team effects are 
predicted for mutual attraction, attitudes 
toward school, incentive value of success, and 
peer support for academic performance. 
Because the team procedures do involve in- 
dividual accountability, effects are also 
predicted for academic performance, per- 
centage of time on task, and motivation. 

Atkinson (1958) and others have described 
motivation to perform a task as the product 
of the probability of success at a task and the 
incentive value of that task to the individual. 
Slavin (Note 3) has extended this model to 
the prediction of maximum effort to the 
degree that the probability of success given 
maximum effort (P, /max) is greater than the 
probability of success given minimum effort 
(P, /min), holding incentive value of success 
constant. The achievement division is 
constructed to maximize the difference be- 
tween P,/max and P; /min for all students by 
rewarding students based on the rank of 
their quiz scores among a group comparable 
in past achievement. That is, because stu- 
dents are evenly matched, the chance of 
being the highest scorer in an achievement 
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division is very low if the student exerts only 
minimum effort but much higher when the 
student exerts maximum effort. In a tra- 
ditional class, both P,/max and P,/min are 
high for very able students and low for poor 
students, resulting in minimal motivation for 
these students. Clifford (1971) used a sys- 
tem similar to achievement divisions and 
found greater performance in the equal- 
comparison group than in an unequal-com- 
parison group or an individual reward con- 
dition. The achievement division, or com- 
parison-with-equals treatment, is thus ex- 
pected to increase academic performance, 
percentage of time on task, perceived prob- 
ability of success, motivation, satisfaction 
with school, and the degree to which stu- 
dents feel that academic success depends on 
their own performance (rather than luck). 


Method 


Subjects 


Subjects were 205 seventh-grade students in eight 
intact English classes in the principal town of a rural 
Maryland county. All but three students were white, 
Four teachers administered the treatments. 


Experimental Design 


The study used a 2 X 2 factorial design, varying re- 
ward structure (team vs. individual) and comparison 
group (comparison with entire class vs. comparison with 
equals, i.e., the achievement division). Each of the four 
teachers taught two classes. The teachers and classes 
were assigned to treatments in a counterbalanced 
fashion to distribute teacher effects equally across the 
main effects. That is, two teachers each taught one 
team-reward-entire-class-comparison class and one 
individual-reward-comparison-with-equals class, while 
two teachers each taught one team-reward-compari- 
son-with-equals class and one individual-reward-en- 
tire-class-comparison class. 


Treatments 


All eight classes studied the same curriculum on the 
same schedule every day for 10 weeks. The curriculum 
was a unit on language mechanics, covering grammar, 
punctuation, and usage. All classes followed a regular 
schedule comprised of 40 minutes of teaching, followed 
by 40 minutes of student work-sheet work and a 20- 
minute quiz. Each cycle thus occupied two and one half 
40-minute periods, and two complete cycles occurred 
each week. The experimental manipulation took place 
only during the work-sheet periods, as the teacher 
presentations and quizzes were the same for all classes. 
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The four treatments resulting from the 2 X 2 design 
were as follows: 

1. Team reward and comparison with equals. This 
treatment is Student Team — Achievement Divisions, 
or STAD (Slavin, in press; Slavin, Note 2). Students 
were assigned to four- to five-member teams that con- 
sisted of students of all levels of past performance and 
both sexes. Teammates were assigned adjacent seats 
during all activities. They were encouraged to work 
together during the work-sheet periods to help each 
other learn the material. However, students took the 
quizzes individually and were given scores determined 
by their rank within homogeneous groups, as described 
below. These scores were then added into a team score. 
A weekly newsletter prepared by the teacher identified 
the successful teams and the team members who had 
contributed the greatest number of points (determined 
by rank within the homogeneous groups) to their team 
scores, thus providing them with public recognition for 
their performance. 

The comparison with equals, or achievement division, 

is a means of insuring each student a roughly equal and 
substantial probability of success if he or she exerts 
maximum effort, Initially, the highest six students, as 
determined by past grades in English, were assigned to 
a homogeneous comparison group (Division 1), the next. 
six to another group (Division 2), and so on. The stu- 
dents' scores on the two weekly quizzes were summed 
and then compared to the scores received by the other 
members of their division. The highest scorer within 
each of these groups earned eight points for his or her 
team, the second scorer earned six points, third scorer 
four points, and all others two points. The high scorer 
in each division was then “bumped” to the next higher 
division, where competition was likely to be more dif- 
ficult. When the highest division grew to nine mem- 
bers, it was split into a Division 1 and a Division 1A. 
This procedure maintained the equality within the di- 
visions over time and corrected mistakes in initial as- 
signment. Students did not interact with others in their 
divisions in any way and did not know which divisions 
their classmates were in. 

2. Team reward and comparison with entire class. 
This treatment was identical to Treatment 1 above, 
except that team scores were formed from the simple 
sum of the members’ quiz scores (number of items cor- 
rect). The weekly newsletter recognized successful 
teams and individuals who earned high scores. 

3. Individual reward and comparison with equals. 
Students worked individually at all times but received 
a newsletter recognizing those who had done well in the 
divisional competition for high scores. 

4. Individual reward and comparison with entire 
class, Students worked individually and received 
standard percentage scores on their quizzes. 


Dependent Measures 


Four categories of dependent vari: 
us "They are as Sells ME 
ehavioral observation. During the last 5 weeks of 
the project, behavioral observation of students Wis 
conducted in all classes. An observer was trained to an 
interobserver reliability of .90, with the experimenter 
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noting whether students were (a) on or off task; (b) if on 4 
task, working with a peer or alone; and (c) if off task, | 1 
interacting with a peer or not. Observations were made 
only during work-sheet periods, and all other observa- — 
tions (such as interaction with staff, out of seat with — 
permission, or otherwise not expected to be on task) 
were excluded from the analysis. Thus, the analysis is. 
restricted to students’ “task opportunities," that is, 
periods during which on-task behavior was clearly ex- 
pected. Most off-task behavior involved daydreaming, — 
talking about things other than the task at hand, and 
soon. On-task time is thus engaged time, that is, time 
spent actually attending to the material. The observer 
watched each student in sequence for 5 sec, sweeping 
the class several times in an observation period. De- 
pendent variables were percentage of time on task and | 
percentage of time spent interacting with peers. 
Academic achievement. Academic achievement was 
measured on two separate tests: the Hoyum-Sanders 
Junior High School English Test (Hoyum & Sanders, 
Note 4) and a treatment-specific test covering the aca- 
demic material taught in class. Parallel forms of both 
tests were given as pre- and posttests. In addition, 
scores on the twice-weekly quizzes in the last 3 weeks 
of the program were used as academic achieveme t 
measures. 
Attitudes. Eight four- to five-item attitude scales 
adapted from the Learning Environment Inventory 
(Walberg & Anderson, 1968) were administered as 
and posttests. They were Satisfaction, Motivation, 
Feeling of Being Liked, Liking of Others, Peer Supporti 
for Academic Performance (e.g., “Other students care. 
whether I do well or not in this class.”), and Perceived 
Probability of Success (e.g., “If someone does well in this: 
class, it is because they worked hard.”). All scales were 
presented in a Likert-type format, where students were 
asked to strongly disagree, disagree, agree, or strongly 
agree with various statements. : 
Sociometric measures. Students were asked to nan 
their classmates who were their “best friends in this 
class” and those who have “helped you with your 
classwork.” Twenty-four spaces were provided for each’ 
question, and students were allowed to name as many 
classmates as they wished. The dependent variables 
of interest were the number of friends and the number 
of helpers named by each student, taken to be an indi- 
cator of mutual attraction and peer tutoring, respec 
tively. 


Results 


Behavior Observation 


The behavioral observation results Wer! 
analyzed using 2 X 2 X 2 chi-square contin 
gency tables. For percentage of time OD 
task, the factors were reward level, compar. 
ison group, and on task/off task; for per 
centage of time interacting with peers, ^ 
factors were reward level, comparison group 
and peer interaction/individual work. A 
other variables were analyzed using a genere 
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Table 1 
* Behavioral Observation 


Comparison Comparison 
with with 
entire class equals 
Percent Individ- Individ- 
task ual Team ual Team 
opportunities ^ reward reward reward reward 
On task* 72.8 92.4 82.2 92.8 
Peer tutoring^ 44.1 84.3 24.4 80.1 


a For on task, x? (Reward X Comparison X On-Off Task) = 3.28, 

D <.10; x? (Reward X On-Off Task) = 37.08, p <.001; and x? 
* (Comparison X On-Off Task) = 4.61, p € .05. 

^ For peer tutoring, x? (Reward x Comparison X Peer Tutoring) 

= 4.99, p < .05; x? (Reward x Peer Tutoring) = 191.85, p. «.001; 

and x? (Comparison X Peer Tutoring) = 8.56, p < 01. 


linear model approach analogous to analysis 
of covariance, in which the incremental R? 
due to treatment was tested for Statistical 
significance (see Kerlinger & Pedhazur, 
1973). Because of the counterbalanced de- 
sign, interaction effects were completely 
confounded with teacher effects. However, 

* only two interaction effects significant at or 
beyond the .10 level were discovered: in- 
teractions between reward level and com- 
parison group on the percentage of time on 
task and percentage of time tutoring 
peers. 

Table 1 summarizes the behavioral ob- 
servation results. The table shows that the 
team classes were on task a significantly 
greater proportion of their task opportuni- 
ties than the individual classes, x?(1) = 

'* 37.08, p <.001. Not surprisingly, the team 
Classes peer tutored far more than the indi- 
vidual classes, x2(1) = 191.85, p < .001. The 
comparison-with-equals classes were on task 
significantly more than the entire-class- 
comparison classes, x?(1) — 4.61, p « .05, but 
the entire-class-comparison classes peer 
tutored more than the comparison-with- 

t equals classes, x2(1) = 8.56, p < .01. This 
Second effect is due to a large difference be- 
tween the frequency of tutoring in the indi- 
vidual-reward-entire-class-comparison 
group (44.196 of task opportunities) and the 
individual-reward—comparison-with-equals 
group (24.4% of task opportunities). The 
reward and comparison group effects thus 

» Support the experimental hypotheses for 
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percentage of time on task. The peer tu- 
toring effect in favor of the entire class 
comparison was not predicted, but it is 
probably due to the fact that the achieve- 
ment divisions are, after all, a competitive 
teward structure. Because students did not 
know which students were in their divisions, 
they were possibly reluctant to tutor anyone 
for fear that their own rank in their division 
would suffer. 


Academic Achievement 


The standardized and treatment-specific 
achievement measures were analyzed by 
means of an analysis of covariance, with their 
respective pretests as covariates. The quiz 
Scores were similarly analyzed, using the 
Hoyum-Sanders pretest as a covariate. 
None of the three measures of academic 
achievement showed any significant differ- 
ences between treatments. Thus, the ex- 
pectations of reward and comparison group 
effects on academic achievement were not 
supported. 

Table 2 summarizes the results of the 


Table 2 
F Ratios for Questionnaire Scales and 
Sociometric Measures 


Factor 
(df = 1, 203) 
Compar- 
Measure Reward ison 
Questionnaire scale 
Satisfaction (5) <1 «1 
Motivation (5) 3.92* 1.20 
Perceived Probability of. 

Success (4) 3.26* «1 
Incentive Value of Success (4) «1 «1 
Dependence of Outcome on 

Performance (4) 428** «1 
Feeling of Being Liked (5) 2.09 5.95** 
Liking of Others (5) 12.80*** 4,02** 
Peer Support for Academic 

Performance (5) 20.58*** 440** 

Sociometric measure 
Number of friends named 4.80**  4.02** 
Number of helpers named 2.88* 171 


Note. Number of items in a scale appear in parentheses fol- 
lowing scale name. 
*p«.10. 
** p « 05. 
*** p € 01. 
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eight attitude scales and the two sociometric 
questions. All scales were analyzed by 
means of analysis of covariance, with their 
respective pretests as covariates. The 
numbers in parentheses after each scale 
name indicate the number of items in the 
scale. The table shows no treatment effects 
for satisfaction and incentive value of suc- 
cess. Reward effects in favor of teams were 
found for Motivation, F(1, 203) = 3.92, p € 
.05, Liking of Others, F(1, 203) = 12.80 p < 
.01, Peer Support for Academic Perfor- 
mance, F(1, 203) = 20.58, p < .001, Perceived 
Probability of Success, F(1, 203) = 3.25, p < 
.10, and Dependence of Outcome on Per- 
formance, F(1, 203) = 3.28, p < .05. In ad- 
dition, reward effects in favor of teams were 
found on the number of friends, F(1,203) = 
4.80, p < .05, and the number of helpers, F(1, 
203) = 2.88, p < .10, named. Thus, the 
predictions of reward effects favoring team 
reward were supported for two of the three 
mutual concern variables, Liking of Others 
and number of friends named, as were the 
predictions for Motivation and Peer Support 
for Academic Performance. In addition, 
reward effects favoring the team conditions 
were found for Perceived Probability of 
Success and Dependence of Outcome on 
Performance. 

Statistically significant comparison group 
effects were found for four variables: Feel- 
ing of Being Liked, F(1, 203) = 5.95, p < .05, 
Liking of Others, F(1, 203) = 3.02, p < .05, 
Peer Support for Academic Performance, 
F(1, 203) = 4.40, p < .05, and number of 
friends named, F(1, 203) = 4.02, p <.05. All 
of these effects favored the comparison- 
with-equals treatments. On the other hand, 
comparison group effects were found for 
none of the scales for which effects were an- 
ticipated (Perceived Probability of Success, 
Satisfaction, Motivation, and Dependence 
of Outcome on Performance). 


Discussion 


In summary, the expected positive effects 
of team reward on academic performance 
were only partially supported. Participation 
in the team treatments increased the per- 
centage of time students spent on task but 
did not increase their academic achievement 
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on weekly quizzes or final tests (controlling 
for initial performance level). On the other » 
hand, the predicted positive effects of team 
reward on mutual concern, peer norms sup- 
porting academic performance, and moti- 
vation were supported. 

Except for the effect on percentage of time 
on task, none of the predicted comparison 


group effects was found, but there were un- t 


anticipated positive effects of the homoge- 
neous group comparison on mutual concern 
variables and peer support for academic 
performance. These effects may be due, 
ironically, to the competitive nature of the 
comparison-with-equals treatment as op- 
posed to the more individualistic nature of 
the entire class comparison. Johnson and 
Johnson (Note 5) have found that competi- 
tive reward structures have a more positive 
effect on interpersonal attraction than do 
individualistic reward structures; any rela- 
tionship seems to be better than none at all. 
However, the fact that the effects on the 
variables that should have been positively 
influenced by comparison with equals 
(principally perceived probability of success y 
and motivation) did not appear suggests that | 
the subjects’ perceptions of the achievement 
division treatment varied considerably from 
that assumed by the experimenter. Could | 
the students have seen the achievement di- 
vision treatment as nongraded rather than 
fairly graded? Could the achievement di- 
visions have influenced teachers’ perceptions 
of students, thereby influencing their be- 
havior in some unexplained way? Only fu- 
ture research will tell. - 

One of the more surprising findings in this 
study was the failure to obtain academic 
achievement effects on either factor but 
particularly on the reward factor. Earlier 
research on TGT, a team technique similar 
to that used in this study, consistently foun 
positive effects of the teams on achievement. 
The major difference between TGT and the 
team techniques used in this study is that 
'TGT uses games instead of quizzes. d 
difference could account for the different 
findings. However, a study that compare 
TGT with games to TGT with quizzes foun 
equal effects of these variations on achiev 
ment (Edwards & DeVries, Note 6). __ 

A more plausible explanation is a differ 


- 
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ence between the individual-reward and 
« entire-group-comparison treatments in this 
study and the control groups in the TGT 
studies. In the present study, students in all 
four treatments followed exactly the same 
activities, with the exception of the experi- 
mental manipulations. These activities 
involved a highly structured intensive 
schedule of teaching, studying, and testing. 
On the standardized test, all four groups 
gained at least three quarters of a grade 
equivalent in only 10 weeks, indicating that 
a good deal of learning was going on. In the 
""TGT studies, control teachers were given the 
same work sheets and objectives as those 
given the TGT teachers, but they were not 
held to the same intensive schedule. It is 
possible that it is this schedule rather than 
the particular team or other structures that 
makes T'GT effective in increasing academic 
achievement. If so, this does not mean that 
TGT or STAD are less practically useful; 
their intensive schedules may be important 
in increasing achievement, while their team 
components are clearly important in in- 


|. creasing such nonacademic variables as 


/ 


t 


mutual concern, cross-racial attraction, and 
the like. However, this explanation would 
call into serious question the usual expla- 
nation of team effects on performance (see, 
e.g., Johnson & Johnson, 1974). 

One major conclusion to be drawn from 
this study is that the team component as 
opposed to the comparison among equals is 
the most important component of student 
team techniques for nonachievement out- 

* comes. Even where reward and comparison 
group effects were found on the same vari- 
able, the reward effects were almost always 
larger. However, because neither reward 
nor comparison group effects were found on 
academic achievement, this study does not 
determine the relative importance of these 
components for increasing achievement. 
Finally, this study demonstrates once 
again the powerful effects of teams on social 
variables such as mutual concern and peer 
support for academic performance. For the 
practitioner, these may be the most impor- 
tant effects of all. These effects have been 
put to good use in special settings particu- 
larly in need of greater mutual concern, such 


NS schools for disturbed adolescents (Slavin, 
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1977c) and racially integrated schools 
(DeVries et al., 1978). However, we should 
not ignore the possible benefits that coop- 
erative team interactions could have on the 
socialization of all children. This study joins 
a long list of evidence that team interven- 
tions can achieve such benefits while edu- 
cating as well or better than traditional in- 
struction. 
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Verbal Encoding Effects on the Visual Short-Term Memory 
of Learning Disabled and Normal Readers 


Lee Swanson 
University of Northern Colorado 


| Popular explanations for reading failure 
in learning disabled children have been re- 
lated to their difficulty in remembering vi- 
w ‘sual stimuli. Specifically, visual-spatial 

recall difficulties due to a generalized per- 


ceptual deficit are believed by many inves- 
tigators (e.g., Anapolle, 1967; Bender, 1938; 
Hermann, 1959; Kephart, 1960; Orton, 1937) 

to be a basic cause of reading disorders. The 

, assumption that learning disabled children's 
~ Visual short-term memory is faulty has 
` partly emanated from Orton's (1937) earlier 
observations of reading retardates with 
normal intelligence, who had difficulty re- 
calling newly presented words. Current 
studies (e.g., Bateman, 1968; Carroll, 1973; 

|. Friedman, Note 1) have found a correlation 
between visual short-term memory and 
|». reading ability, giving support to Orton's 
earlier observations. ts of the vi- 
sual short-term memory deficit hypothesis 
have focused upon difficulties in figure~ 
ground perception (Bender, 1938; Frostig, 


found in recall of nonverbal stimuli between normal and learning disabled 
readers. These data suggest that primary reading deficits in learning disabled 


1972), attention to visual arrays (Koppitz, 
1973), and ocular-motor control (Leisman 
aca cor ile ae rr 
G ta ng ty 

in visual function- 


assumption 
that reading disabilities are the result of a 
pei ] deficit 


Al h the relationship between 
memory for visual stimuli and reading ac- 
quisition is a logical one, Vellutino and col- 
leagues (Vellutino, Pruzek, Steger, & 

1973; Vellutino, to, 
& Phillips, 1975; Vellutino, Steger, & Kan- 


recent study ( Vellutino et al., 1975), poor and 
of lian vus ind mi 

tions 

stimuli) and asked to demonstrate retention 

for these stimuli on three separate occasions 

immediately after initial presentation, 24 

hours later, and 6 months later. Poor 


cent findings reinforce the resulta of earlier 
studies (Vellutino et al., 1972, 1973) dem- 
onstrating that poor readers sustain no basic 
dysfunction in visual perception and im- 
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mediate visual recognition. Their consistent 
findings that poor readers do not differ from 
adequate readers on visual stimuli tasks raise 
serious questions as to the adequacy of avi- 
sual short-term memory deficit hypothesis 
(as well as a perceptual deficit hypothesis) 
applied to learning disabled children. 
Further, an extension of these findings with 
learning disabled children might suggest 
that learning disabled children perform as 
well as normals on recall of nonverbal (un- 
familiar) stimuli but not as well with verbal 
stimuli. Although the Vellutino et al. (1972, 
1973, 1975) studies are inconclusive as to the 
precise nature of reading disorders, their 
findings indirectly support the possibility 
that reading difficulties may be associated 
with deficiencies in verbal encoding. At 
present, there are some suggestive data (e.g., 
Perfetti & Goldman, 1976; Perfetti & Ho- 
gaboam, 1976) that directly support the 
hypothesis that reading difficulties may be 
specific to linguistic aspects in contrast to 
visual aspects of word learning. 

The present investigation was designed to 
extend the findings of Vellutino et al. (1972, 
1973, 1975) by employing procedures that 
directly assessed the hypothesis that verbal 
encoding deficits (dysfunction in visual- 
verbal spatial integration) are related to 
reading disabilities in the designated popu- 
lation.! More specifically, the present ex- 
periment was designed to examine the sug- 
gested interaction of verbal encoding with 
nonmeaningful shapes and serial (visual- 
spatial) recall. It was predicted that visual 
memory in normal readers would be signifi- 
cantly improved by the use of verbal medi- 
ators to aid recall, while no such improve- 
ment was anticipated for poor readers. 


Method 
Subjects and Design 


A total of 60 children (normal and learning dis: 
separated into 4 groups of 15 each, matched ert 
nological age, IQ, and sex, served as subjects in the ex- 
periment. Ten males and 5 females were in each cell. 
Additional children participated in the study; however, 
matching procedures that included only learning dis- 
abled children with specific reading difficulties ac- 
counted for the small n in the statistical analysis 
Children were selected from regular or special duration 
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(learning disability) classes in the Alburquerque, New 
Mexico, and Columbia, South Carolina, public school. 
systems. 

Learning disabled and normal readers were randomly 
assigned to either the named or unnamed condition 
while counterbalanced for sex. Computer mean IQ 
scores from the Otis-Lennon intelligence test and mean 
chronological ages for the named condition were 101.6 
(SD = 7.85) and 9.4 (SD = .30) for disabled and 103.1 
(SD = 6.13) and 9.3 (SD = .43) for normal readers;in ^ 
the unnamed condition, mean IQ scores and ages were 
100.3 (SD = 6.93) and 9.4 (SD = .26) for disabled and 
104.1 (SD = 5.87) and 9.5 (SD = .31) for normal readers, 
respectively. The four groups did not differ signifi- 
cantly on IQ or chronological age. Classification of the 
learning disabled children was based upon the National 
Advisory Committee on Handicapped Children (Note 
2) definition and the Appraisal and Review Committees 
in the represented public schools. Further selection 
among learning disabled children was determined by 
reading scores below the thirtieth percentile equivalent 
based on Wide Range Achievement Test Reading sub- 
test scores (Jastak & Jastak, 1965) and subtest scores 
of Reading Recognition and Reading Comprehension 
of the Peabody Individual Achievement Test (Dunn & 
Markwardt, 1970). Selection criterion for normal 
readers was a standardized reading test score at grade 
level or above. Children were screened for visual- - 
motor, auditory, and visual acuity problems based upon. 
psychological and medical examination records. None 
of the children in this experiment was on medication. 


Stimuli 


Stimuli were six nonsense random shapes selected 
from Vanderplas and Garvin's (1959) 8-point assort- 
ment. The stimuli (Numbers 19, 20, 23, 24, 25, and 26) 
were chosen because of their low “association” and 
“content” values as established by Vanderplas and 
Garvin’s (1959) norms. Figures were of lower associa: — 
tion value than Ross and Youniss's (1969) nonverb 
item classification. Each of the shapes used was drawn | 


in black ink on a 10 X 10 cm white card with the same 
dimensions as the normed shapes. Names were as- 
signed to each of the shapes so that the name was â 
meaningful representation of the shape to some extent 
(see Figure 1). The names were arbitrarily chosen ya 
the experimenter but seem to be suggested by outlines - 
of the figures. 


1 Learning disabilities in this study refers to children: | 


as defined by the National Advisory Committee on 
Handicapped Children (Note 2), who have some YP" | 
of cerebral dysfunction. The definition would exclude = 
secondary problems resulting from sensory defects, 
motor performance, mental defects, emotional prob: 
lems, and environmental deprivation or other extrinsi¢ 
factors. A commonly used term for their reading 
problem is dyslexia; while the general term rec He 
disabilities has correlates related to various instrue | 
tional, sociocultural, psychological, psycholinguistiói 
and physiological occurrences. j 
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NONSENSE RANDOM SHAPES 


N43 


Tooth (#19) Bird (#20) Tree (#23) 


UUs 


Rock (#24) Badge (#25) Helmet (#26) 


Figure 1. Stimulus shapes and names. (Adapted from Vanderplas & Garvin, 1959.) 


* Procedure 


Children in both reading groups were assigned for 
participation in named and unnamed stimulus condi- 
tions. Two groups (disabled and normal readers), two 
training conditions (named and unnamed), and six se- 
rial positions were used with repeated measures on serial 
Position. Children were trained and tested individu- 
all. During iraining, 15 children from both the 
learning disabled and normal reading groups were 
randomly assigned to the named condition, where they 
learned names for shapes. . Training trials followed in 
Which each of the six shapes was randomly shown for 

'*2sec. Children gave the name of each shape when it 
was shown, with correction after 3 sec. Training con- 
tinued until children could correctly name all shapes 
within two consecutive trials. The remaining children 
were assigned to the unnamed condition and were given 
Practice in discriminating top and bottom contours of 
the shapes in a matching-to-sample task, but they were 
not given names. Cards displaying each random shape 
were presented side by side. The child was given a test 
shape and asked to match the corresponding shape. 

- Approximate time for each child in either condition was 
20 minutes. All children met pretesting criteria. Im- 
mediately after pretraining, each child was tested on a 
probe-type serial memory task. Posttest for named 
pene associations indicated all children retained 

labels, 

A probe-type serial memory task procedure, similar 
to that used in other studies (Atkinson, Hansen, & 
Berbach, 1964; Swanson, 1977; Swanson & Watson, 
1976), was used in this investigation. The basic para- 

.digm involves the presentation of a series of items (e.g., 


familiar pictures) in serial order. The items are laid 
down in a horizontal row in front of the subjects, where 
each item is exposed for a few seconds and then turned 
face down. Then, a probe item is displayed, and the 
subject’s task is to point to the card in the array that 
matches the probe. The main difference between the 
present and previous investigations was that coded 
unfamiliar stimuli were used, and a training period was 
given to meet the naming criterion. Six different 
shapes were presented for 2 sec each, one at a time. 
Shapes were then put in a facedown array after expo- 
sure. Once the stimulus array had been presented, a 
duplicate shape (probe item) was then shown, and the 
child was asked to point out the corresponding shape 
in the presentation series. 

Twelve trials were presented in each session. Each 
trial consisted of six shapes selected randomly for each 
serial position, with the stipulation that each position 
would be correct two times, and no shape would be 
correct more than twice over the 12 trials. Stimuli were 
presented from the child’s right to left, so that spatial 
and presentation positions were confounded. Ap- 
proximate length of probe-type tasks was 30 minutes, 


Results 


The proportion of correct responses con- 
stituted the overall memory measure. 
Proportion of correct responses by serial 
position is presented in Figure 2. To 
simplify the figure, adjacent serial positions 
were paired (1-2, 3-4, and 5-6). A three- 
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Figure. Proportion of correct responses as a function of learning disabled and normal reading abilities, 


named and unnamed stimuli, and serial positions. 


way analysis of variance for normal and 
learning disabled readers, named versus 
unnamed conditions, and six serial positions 
(repeated measures) was performed. Scores 
corrected for guessing (proportion of times 
a position was chosen and was correct + 
proportion of times that position was cho- 
sen) exhibited the characteristically found 
response bias for middle positions (see 
Swanson, 1977). 

Differences between disabled and normal 
readers indicated that recall of probed serial 
positions was more difficult for the learning 
disabled than normal children, as reflected 
in the task main effect, F(1,56) = 10.92, p < 
-01. The main effect of name training was 
significant, F(1, 56) — 7.81, p « .01, but 
group effects (as shown in Figure 2) were 
obscured by an interaction of Group X Task 
(name vs. unnamed) training, F(1, 56) = 
20.57, p < .001. Tests of simple main effects 
indicated that name training favorably in- 
fluenced performance on the recall task by 
normal readers, F(1,280) = 3.71, p >.05. As 
is shown in Figure 2, a significant main effect 
on group reading ability was found for the 


l 


named condition, F(1, 56) = 54.66, p < 00l 
while no significant difference in recall peu 
formance was found between reading groups 
for the unnamed conditions (F < 1). Th 
the conclusion of Vellutino et al. (1975) tha 
reading disability cannot be attributed b 
deficiency in visual memory was Con 
firmed. a E 
The main effect of serial position was 1 i 
significant, F(5, 280) = 4.18, p < .01, W i 
the interaction of Condition X Serial Fr 
tion did not reach significance. Ne f 
man-Keuls comparison (means colak 
across group and condition) of all p 
differences on the recall task indicated t j 
the recency positions (paired compa 
Positions 5 and 6 > 3 and 4) accounte i 
the significant (p < .01) serial position E 
effects. A reanalysis of response e 
converted to d’ measures (see Hochhet | 
1972; Swanson, 1977), using the same | 
statistical procedures, yielded similar e 
interaction, and simple main € " 
Analysis of response biases, using t typ 
index (see Swanson, 1977), indicated à 


alee 
ical biased choice pattern for the middle 
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rial positions. Thus, it is apparent that on 
the verbal and visual recall task employed in 
the present study, no general response pat- 
tern accounting for recall differences unique 
to either the learning disabled or the normal 
readers is evident. 


Discussion 


Vellutino et al. (1973, 1975) have consis- 
tently argued against the visual-memory 
deficit hypothesis of reading dysfunction. 
Present experimental results provide repli- 

“cation of the Vellutino et al. findings on a 
visual recall task. Learning disabled chil- 
dren with specific reading problems per- 
formed as well as normals on a nonverbal 
visual-spatial short-term memory recall 
task, thereby suggesting that visual memory 
is not a significant cause of a specific reading 
disability. Learning disabled readers were 
also compared to normal readers in serial 
recall response patterns. 

In contrast, labels of nonfamiliar random 
shapes led to significant improvement in 
recall for normal readers, while the effects of 
labels on learning disabled children’s recall 
were negligible. Although not significant, 
recall was better in the unnamed than 
named condition for learning disabled 
readers. This finding supports the hy- 
pothesis that primary reading disability is 
attributable to a dysfunction in visual-ver- 
bal integration and not to visual learning 
difficulties, as suggested by the perceptual 
deficit hypothesis (e.g., Anapolle, 1967; Ke- 
"^ phart, 1960; Koppitz, 1973; Orton, 1937). 

These data are perhaps best interpreted 
overall by assuming that labels serve to in- 
tegrate a visual representation and improve 
the retrievability of the various stimulus 
features for normal readers, while the use of 
labels in the case of learning disabled readers 
can be viewed as a mediator to which stim- 
ulus features were not associated. Naming 
for these children did not provide the basis 
for an integrated representation that served 
to increase the retrievability of visual infor- 
mation, Some support for this finding 
comes from literature that suggests that a 
_ reading disability is a manifestation of spe- 
_ cific linguistic problems such as phonemic 
4, Scementation (e.g., Shankweiler & Liber- 
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man, 1972), verbal concept deficiency (e.g., 
Blank & Bridger, 1967), or mediational in- 
efficiencies (Swanson, 1977). 

With respect to their practical applica- 
tions, the results of this study seriously 
question the validity of commonly employed 
remediation techniques for improving visual 
perception in learning disabled readers. 
Thus, the poor record of visual perceptual 
training programs based upon the perceptual 
deficit hypothesis may be attributed to the 
fact that learning disabled children are not 
perceptually impaired. Further, it would 
seem that the role of visual-spatial short- 
term memory in producing reading retar- 
dation has been overestimated. There is 
evidence that such dysfunction and reme- 
diation procedures may characterize the 
young child, but even that evidence is mixed 
(e.g., see Vellutino et al., 1972, Discussion 
section). Finally, I would like to emphasize 
that the present findings can only be gener- 
alized to learning disabled children within 
the age and grade ranges that characterized 
the research sample. 
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Teacher-Student Interactions in Desegregated Schools 


Stephen B. Hillman and G. Gregory Davenport 
Wayne State University 


As part of the desegregation plan in t 


desegregated schools were involved in 


he city of Detroit, teachers in recently 
an in-service program. As part of this 


program, teacher-student interaction data were collected in each teacher's 
classroom using the Brophy-Good Dyadic Interaction Observation System. 
These data were standardized for each classroom to. produce an index of the 
extent to which the allocation of instructional opportunities was proportionate 


to the distribution of students in the class, 


The results of this study indicated 


that black students and males received a greater proportion of the classroom 
interactions than did white students or females, and that both male and fe- 
male teachers acted in very similar ways with male and female students. 


Teacher-student interactions in the 
classroom are at best uneven, with some 
students receiving greater quantities of 
teacher contact than others (Good, 1970; 
Jackson & Lahaderne, 1967; Kranz, Weber, 
& Fishell, Note 1; Mendoza, Good, & Bro- 
phy, Note 2). Several studies have also 
shown some students to receive quantita- 
tively superior treatment from their teachers 
(Brophy & Good, 1970, 1974; deGroat & 
Thompson, 1949; Good & Brophy, 1972; 
Rist, 1970; Rowe, 1969; Silberman, 1969). 
As has been pointed out elsewhere (Good & 
Brophy, 1971), previous investigators have 
consistently been able to demonstrate the 
effects of differential teacher behavior 
toward students differing on characteristics 
such as achievement level, sex, or socioeco- 
nomic level. These kinds of studies acquire 
particular significance when extended to 
situations involving the variables of student 
and teacher race. 

Though the Brown v Board of Education} 
school desegregation decision has had wide 
impact with regard to the integration of 
American schools for the purpose of pro- 
viding for equal educational opportunity, it 
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igan 48202. 


continues to remain an unanswered question 
as to whether or not black and white children 
receive the same quantity and quality of in- 
struction even though they are in the same 
classroom. Previous research on a number 
of other student characteristics as they affect 
instruction clearly suggests that race may be 
an extremely important variable. Indeed, 
several studies have already examined 
teacher and student racial and ethnic vari- 
ables as they influence the quantity and 
quality of classroom interaction (e.g., Byalick 
& Bersoff, 1974; Jackson & Cosca, 1974; 
Rubovits & Maehr, 1973; Gay, Note 3; U.S. 
Civil Rights Commission, Note 4). 

Rubovits and Maehr (1973) report what 
they call a “disturbing instance of white ra- 
cism” in that black students in their sample 
were given less attention, were ignored more, 
praised less, and criticized more than white 
students by the sample of white teachers. 
Their results indicated that white students 
received far more attention in general than 
did the black students. Using a sample of 
both white and black teachers, Byalick and 
Bersoff (1974), in their study of reinforce- 
ment practices in integrated classrooms, 
found that teachers reinforced opposite race 
children more frequently than they did 
children of their own race. 

A study by Jackson and Cosca (1974) 
using a modified version of the Flanders 
System of Interaction Analysis reported 


1 Brown v Board of Education of Topeka, 347 U.S. 
483 (1954). 
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finding significant disparities in favor of 
Anglo versus Mexican American students on 
each of the following six variables: teachers’ 
use of praise, acceptance or use of students’ 
ideas, teacher questioning, teachers’ giving 
of positive feedback, all noncriticizing 
teacher talk, and all student speaking. Both 
Anglo and Mexican American teachers were 
in the sample in this study, and both groups 
were found to provide more favorable 
treatment to Anglo American students than 
to those who were Mexican American. 
Gay’s (Note 3) research on teacher be- 
havior with black and white students dem- 
onstrated that all teachers act similarly in 
differentiating their verbal behaviors with 
black and white students, that black stu- 
dents did not participate as often as white 
students in class discussions, and that white 
students participated in more academic and 
substantive ways and received more en- 
couragement and praise from teachers, while 
blacks participated more in procedural and 
behavioral or discipline interactions. Ac- 
cording to Gay (1975) it made little differ- 
ence whether teachers were black or white, 
or teaching elementary or secondary classes; 
they expected the quality of white students’ 
classroom participation to be better than 
that of black students. 

Aware of the research findings that indi- 
cated student ethnicity to be a major deter- 
minant of teachers’ expectations and inter- 
actional behaviors and the results of a local 
survey (Detroit Public Schools, Note 5) 
suggesting that teachers did not believe that 
they had different expectations for black and 
white students, and faced with a court-or- 
dered desegregation plan to be implemented 
in February 1976, the Detroit Public Schools 
undertook a large-scale in-service program 
through which it hoped to insure the equal 
treatment of both black and white stu- 
dents. 

This In-Service Training Program for 
Detroit teachers in recently desegregated 
schools took place in four stages. During the 
first stage, 1,500 teachers from 80 schools 
attended, on a voluntary basis, one of five 
weekend meetings. The purpose of these 
meetings was to deal with the effects of 
teacher expectations, beliefs, and attitudes 
on pupil behavior. More specifically, these 
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meetings focused upon teaching in a multi 
racial, multiethnic school system with pre“ 
sentations and exercises having knowledg 
and attitude as opposed to skill develop: 
objectives. The major purpose of the 
weekend workshops was to establish enoi 
rapport between the teachers, the meeti 
leaders, and coder-observers so that 
teachers would be willing to participate 
what was expected to be the major part of 
the treatment and allow themselves to bi 
observed while teaching a lesson to the stu; 
dents in their classrooms. 

Following these weekends, trained ob: 
servers entered the classrooms of the 
ticipating teachers and coded the interact 
between these teachers and their studer 
The participating teachers represente 
grade levels, kindergarten through tw 
grade. The observation system, a mod 
version of the Brophy-Good Interad 
Coding System (Brophy & Good, 19 
produced descriptive information 0! 
nature of this teacher-student intera: 
with specific information concerning te 
questioning patterns, feedback me! 
reinforcement, and criticism patterns as. 
as indices of pupil behavior and mis! 
vior. 

Following this initial observation the 
descriptive data were shared with each oft 
teachers as a way of describing to them t 
nature of their interaction with their 
dents. Previous research by Good 8 
Brophy (1974) has shown that this form: 
feedback can be very helpful in producing 
changes in teacher behavior where nec 
sary. 

Following this feedback coders 
reentered these classrooms in order to må 
another observation of teacher-student! 
teraction in an attempt to determine to¥ 
extent feedback to the teacher had affi 
their interaction patterns. The data! 
ported in this study include only those 
lected during the first set of classroo! 
servations; they are descriptive interat 
patterns in a multiracial urban sett 
well as a set of preobservations or base? 
be compared at a later time with the sec 
set of observations collected after the 
back intervention aspect of the in-Se? 
program. 
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Table 1 
» Distribution of Teachers in Sample by Sex, 


Race, and Grade Level 


Black White 
Grade level M F M F Total 
Elementary (K-5) 3 84 Il 60 15 


Middle school (6-8) 12 40 7722805 99 
High school (9-12) 4 18 15 12 49 
"Total 19 142 48 97 


Note. M = male; F = female. 


Method 


~ 


Sample 


Usable data were obtained from 306 classrooms re- 
cently affected by the Detroit court-ordered desegra- 
gation. This included the classrooms of 158 elementary 
school teachers, 99 middle school teachers, and 49 high 
school teachers. There were 161 black teachers and 145 
white teachers; 67 were male, and 239 were female. 
Table 1 presents a further breakdown of the teachers 
by sex, race, and grade level. The sample of teachers 
was heterogeneous in terms of age, experience, and 
subject matter taught. The average age of the teachers 
and years of teaching experience were 37.48 years (SD 
= 11.15) and 12.04 years (SD = 8.75), respectively. 

œ White teachers tended to be older (M = 40.83) than 
black teachers (M = 34.97), and white teachers tended 
to have more years of teaching experience (M = 15.00) 
‘than black teachers (M = 9.70). While subject matter 
taught by teachers was not a major concern of this 
study, there was considerable variation in the academic 
subjects taught during the classroom observations. 


Data Collection 


All teachers who attended one of the several weekend 
meetings were approached by the trained coders who 
were part of the weekend meeting staff to schedule an 

^ observational time for the following week. The nature 
of the classroom observations was explained to teachers 
às an opportunity to gain more knowledge about their 
classroom interaction patterns and instructional styles. 
Teachers were told that the data from individual ob- 
servations could only be meaningfully interpreted rel- 
ative to each teachers’ lesson goal and that the data were 
most meaningful to teachers only when collected during 
an uncontrived teacher-student lesson exchange. 

Coders went to teachers' classrooms according to the 
prearranged schedule and were generally introduced by 
teachers to the students as "someone wanting to observe 
the class" and were seated in an unobtrusive position 
to the side of the classroom. After briefly familiarizing 
themselves with the classroom procedures and with the 
subject of discussion, the coders would record the data, 
subject matter, time, teacher sex and race, and student 
sex—race composition in the class, and then they began 
to code teacher-student verbal interactions. 

Only classroom observations of 10 minutes or longer 

„were included in the data analysis, with the length of 
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classroom observations ranging from 10 minutes to 43 
minutes with a mean observation time of 21.79 minutes 
and a standard deviation of 6.65 minutes. Length of 
observation was quite variable due to the naturally oc- 
curing length of the classroom material, the extent to 
which a given teacher was cooperative with the project, 
and the observers’ desire to obtain as complete and 
representative a sample as possible, 

The observational instrument was a modified version 
of the Brophy-Good Dyadic Interaction Observation 
System (Brophy & Good, 1970). This system yields a 
variety of qualitative and quantitative measures of 
student-teacher verbal interactions, separately re- 
corded for each student in the class. The coding pro- 
cedure was modified for this study in order to distin- 
guish among behaviors associated with individual stu- 
dents of various ethnic groups. Only public classroom 
behaviors that were directed to or from individuals of 
the class during lessons and that were verbal and with 
the entire class were coded. Thus, activities such as seat 
work, test taking, and various classroom organizational 
activities were omitted from coding, Each time an in- 
teraction was coded, the sex and race of the student 
participating in that interaction were also coded. 

While the Brophy-Good Dyadic Interaction Obser- 
vation System is generally well known, it should be 
pointed out that the system records three basic types 
of teacher-student interactions. Categories 1-13 refer 
to academic response opportunities. Of the academic 
response opportunities, the number of process questions 
and the number of product questions are categories of 
types of teacher questions. Process questions require 
students to verbally explain the problem-solving steps 
or strategies used in arriving at a conclusion, whereas 
product questions require a single word or short answer 
from students usually reporting facts from memory. 

Categories 14-18 refer to teacher questions or state- 
ments dealing with routine classroom management and 
procedures, and categories 19-24 refer to student-ini- 
tiated interaction. Most of the teacher-student in- 
teraction variables are self-evident from their titles. 

Reliability for the 14 coders was obtained by having 
each of the observers code a 15-minute videotape re- 
cording of a fifth-grade math lesson. Although this was 
not the most desirable method, it was the only one 
available for this particular study. Reliability was 
computed as the number of agreements divided by the 
number of agreements plus disagreements plus omis- 
sions multiplied by 100 for each pair of observers. The 
average reliability was 80%. The primary reason for the 
low reliability was the difficulty encountered by the 
observers in attempting to code the sex of the student. 
This was particularly difficult because the videotape 
camera was situated in the back of the room and voice 
tone was often the only cue possible in obtaining the sex 
identification. Observers reported that they had no 
problems coding the race and sex variables in the 
classroom setting. 


Data Preparation and Analysis 


The raw data from the Brophy-Good (1970) Dyadic 
Interaction Observation System were modified to allow 
for the analysis of possible disproportionate instruc- 
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tional opportunities among teachers and students of 
different racial groups. Raw scores of each category of 
student sex and race were transformed into'a stan- 
dardized score based on that group’s representation 
within a given observational category proportionate to 
its representation of students in the classroom. The 
standardized scores were calculated by using the fol- 
lowing formula: 


Standardized score = 
Total no. interactions 
for variable x ina 
~ given student sex-race 
category 


x 
Total no. interactions Total no. 
for variable x students in 
sex-race 
category 


"Total no. 
students in class 


where variable x equals, for example, a response op- 
portunity category such as product question. 

Calculations of these standardized scores were done 
only in instances where a particular interaction obser- 
vational category occurred during the classroom ob- 
servations and where students of a particular sex-race 
category were present at that time. 


Results 


The effects of student and teacher sex and 
race on teacher-student interactions were 
examined in a series of four-way analyses of 
variance. Five of the student-teacher in- 
teraction categories were eliminated from 
the analyses, however, because the frequency 
of occurrence of behaviors in these categories 
was too low for meaningful statistical anal- 
ysis. The categories eliminated were: 
teacher ignores student behavior, teacher 
nonintervenes in student behavior, teacher 
praises student behavior, teacher selects 
inappropriate target for student discipline, 
and teacher criticizes student-initiated 
question. The first four were nonacademic 
student-teacher interaction categories, while 
the fifth was a student-initiated behavior. 

‘Table 2 shows the number of standardized 
dependent variables (out of a total of 19) for 
which each effect reached statistical signif- 
icance (p <.05). To illustrate the impact of 
the various main effects and interactions, the 
binomial probability for obtaining n/19 
repeated significant tests is also shown in 
Table 2. (This binomial probability should 
be interpreted cautiously, however, since to 
a degree, the dependent variables were cor- 
related with each other.) 


For the standardized variables, the num- 
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ber of total main effects reaching the .05 level 
(9 of 19) was in itself significant based on the: 
binomial theorem (p < .0001). On the 
one-way tests, sex and race of student proved 
to be potent classifying variables for nonac- 
ademic behaviors. In every case where 
student sex was found to be a significant 
variable, males received a greater proportion 


of the variable than females (e.g., males re- : 


ceived a greater proportion of product 
questions than females). A similar consis- 
tency was found for the race variable, with 
black students engaging in instructional 
activities to a greater extent than white’ 
students on each dependent variable that 
was significant in the analysis of variance. 
The mathematical computation of the 
standardized variables suppressed differ- 
ences between teachers. The standardizing 
was done within each individual teacher's 
class rather than between the classes of dif- 
ferent teachers. Where one standardized 
variable was low (or high) for a particular 
student race-sex combination in a specific 
teacher’s class, other student race-sex 
combinations for that variable had to be high} 
(or low) in roughly an equal but opposite 
direction. For any given standardized 
variable, after allowing for rounding and 
skewedness errors, the mean for all possible: 
student race-sex combinations in any given 
teacher’s class would be 1.0; thus, the anal- 
yses of variance would be unable to discern 
any significant main effects based solely on 
teacher sex or race. The tests could, of 
course, still detect interactions between 
student and teacher characteristics. The) 
main statistical advantage of the standard- 
ized variable is to provide an accurate picture 
of the first-order interaction effects in a way 
that controls for the variations in rate due 
solely to teacher characteristics. 1 
Two of the two-way interactions wet 
significant often enough that the number 0 
significant tests was in itself significant , 
Race of Student X Race of Teacher (7 of 19 
tests significant, p < .0001 under the bino: 
mial theorem), and Race of Student X Sex of 
Teacher (5 of 19 tests significant, p € 00: 
under the binomial theorem). In both case 
the majority of the significant two-way ™- 
teractions occurred in the academic response 
dependent measure variables. A 
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Table 3 manta 
Statistically Significant (p < .05) 


Comparisons on the Newman-Keuls Tests 


Group 

Dependent Low High 

variable mean mean q df 
Product 

questions WT-WS  WT-BS 3.753 4,1122 
Incorrect 

answer WT-WS  WT-BS 3.984 4,871 
Repeated 

question WT-WS  WT-BS 3.726 4,589 
Product. 

questions FT-WS  FT-BS 3.764 3,1122 
Correct 

answer MT-BS FT-BS 3.805 4,1116 


Note. W = white; B = black; S = student; T = teacher; M = 
male; F = female. 


A series of Newman-Keuls (Winer, 1971) 
contrast tests were performed on significant 
two-way interactions. These tests compared 
the differences between means for each of 
the six possible comparisons for the four 
groups entering into the significant two-way 
interactions. 

The results of these tests are presented in 
Table 3. In general, they tend to show a 
cross-race effect, with the group means being 
lower for teachers of the same race as the 

student than for teachers of a different race 
from the student. All of the significant re- 
sults presented in Table 3 are on academic 
variables and occurred on just three depen- 
dent variables: product questions, student 
gave incorrect answer, and teacher gave an- 
swer. Newman-Keuls tests on the stu- 
dent-behavioral and  student-initiated 
variables produced no significant results. 

_ The pattern of differences detected in the 
significant Newman-Keuls tests was almost 
universally present in all 13 of the academic 
variables, even though the statistical tests 
were not significant on all of them. The 
cross-race means were almost always hi 
than the same-race means. ier 


Discussion 


Main effect analyses showed that where 
amain effect was significant for race of stu- 
dent, in every case black students received 
a greater proportion of the variable than 
white students. 'Thus blacks received a 
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greater proportion of product questions 
gave no response to more questions, receit 
more criticism from teachers for their b 
havior, had more self-initiated questions 
relevant comments, were the recipients 
greater teacher nonacceptance of a stui 
question or response, and received more 
teacher feedback to a student question 
response. This finding is in conflict with the 
results presented by Gay (1975) and by Ri 
bovits and Maehr (1973) which showed tha 
white students received far more attentior 
from teachers than black students, as well a 
the data presented by Jackson and Ci 
(1974) which indicated that Anglo Ame 
students received greater teacher atte 
than Mexican American students in 
same classroom. In the Rubovits and 
(1973) study the teachers were all v 
preservice teachers with the data colle 
in a specially contrived experimental 
tion, whereas the teachers in this 
were all working teachers of which 5396 
black. In the Jackson and Cosca (19 
study the minority group population 
Mexican American, whereas in the prese 
study the minority group populatio 
black. It is not clear whether this factor 
responsible for the contradictory findings 
whether some other variables such as t 
exclusively urban nature of the pres 
sample or the racial make-up of the a! 
classroom in the sample are responsible! 
the effects. In most other classroom studi 
where race has been an important vari 
the effects have not been analyzed in 
of student race (e.g., Barnes, 1973; B; ) 
& Bersoff, 1974), and few other studies exi 
to clarify these conflicting results. 
data will be needed before this effect i8U 
derstood. 
The main effect analysis of the sex V9 
able showed that male students partici 
in a greater proportion of instructional! 
tivities than female students. This was V 
for each of the following 12 variables 
which statistical significance was ob 
product question, students not volun! 
students do volunteer, student gives 
answer, student gives incorrect ans 
teacher criticizes student answer, teal 


asks a new question, teacher criticizes © 


havior, student asks a question or m 
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relevant response, student asks an irrelevant 

» question or makes an irrelevant response, 
teacher doesn’t accept a student question or 
response, and teacher gives feedback to a 
student question or response. These results 
are highly consistent with those obtained by 
Good, Sikes, and Brophy (1973). The vari- 
ables on which significance was obtained also 
illustrate that male students initiated more 
instructional contact from teachers and that 
teachers initiated more instructional contact 
with males than with females. 

When these results are examined from the 

“point of view of both positive and negative 
behaviors on the part of both teachers and 
students, a very unclear picture emerges. 
Male students are, for example, significantly 
more active than their female counterparts 
in answering correctly in class (a presumably 
positive behavior) but are also more likely to 
answer incorrectly (a presumably negative 
behavior). Thus it is difficult to discuss 
specific student or teacher behaviors in 
terms of their ultimate impact upon the 
classroom environment and learning out- 

|j» comes. The same holds true for the nonac- 
ademic and student-initiated behavior 
variables where, for example, male students 
initiated both more relevant as well as more 
irrelevant responses and questions. 

Two variables (teacher asks more product 
questions, and teacher criticizes student 
behavior) were found to occur significantly 
more often for both black and male students. 
If these two variables are viewed as behavior 
management techniques, then it may be 

hypothesized that teachers react to the high 
activity level of these students by asking 
them more specific, focused, behavior-con- 
trolling types of questions and by criticizing 
their behavior more often in an attempt to 
maintain classroom control. Further re- 
search will be needed to look into this pos- 
sibility. The lack of the use of teacher praise 

; for student behavior is also significant in its 
absence as a behavior control technique. 
This variable was not included for statistical 
analysis, as its frequency of occurence was 
too low. 

The lack of any significant two-way in- 
teractions involving the teacher sex and 
student sex variables suggests clearly that 
while male and female students behave dif- 
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ferently in the classroom, male and female 
teachers treat male and female students 
similarly. Thus the same pattern of greater 
activity by males occurs in the classrooms of 
both male and female teachers, and the same 
pattern which has been shown to occur re- 
peatedly with female teachers (Good et al., 
1973) also occurs with male teachers. The 
arguments of some educators calling for the 
sexual balancing of teaching staffs based 
upon the notion of differential teacher be- 
havior as a function of teacher and student 
sex variables (Grambs & Waetjen, 1966; 
McNeil, 1964; Peltier, 1968) derives no 
support from the present data. 

Analysis of the Race of Teacher X Race of 
Student interaction showed that black stu- 
dents of white teachers as compared with 
white students of white teachers received 
greater proportions of the following vari- 
ables: product questions, student gave in- 
correct answer, and teacher repeated the 
question. On all of the other possible in- 
teractions, no significant differences were 
obtained, suggesting that on the whole the 
interaction patterns between black and 
white teachers and black and white students 
are far more similar than they are different. 
Similar findings have been reported by other 
researchers (e.g., Barnes, 1973; Mangold, 
1974) wherein they report that only a very 
small number of significant differences were 
observed in the interaction of teacher and 
student races. 

It is the case however that the cross-race 
pattern found in the significant Newman- 
Keuls tests was universally present, though 
not significant in all of the 13 academic 
variables. This pattern should be more 
closely examined in future research in this 
area, as it is consistent with the findings of 
other research (Brown, Payne, Lankewich, 
& Cornell, 1970; Byalick & Bersoff, 1974). 
One possible explanation for its occurrence 
in this study would be the possibility that 
white teachers overcompensated in their 
interactions with black children in an at- 
tempt to make the patterns appear to be 
equal. Though the teachers did not know 
the details of the observation system or the 
particulars of what the observers were lookng 
at, surely they had the expectation that in 
recently desegregated schools, the instruc- 
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tional opportunities presented in the class- 
room should be proportionately distributed 
among black and white equally. 

The Race of Teacher X Sex of Student 
interaction resulted in the finding that black 
students of female teachers received more 
product questions than white students of 
female teachers and that black students of 
female teachers gave more correct answers 
than did black students of male teachers. 
Because of the small number of effects that 
were found to be significant concerning the 
interaction of these variables, the authors 
suggest that not too much importance ought 
to be attached to these results unless they are 
replicated by further studies. 

Because the number of significant effects 
was lower than chance as determined by the 
binomial theorem, none of the means in the 
other significant interactions were subjected 
to post hoc comparisons. Thus when looking 

at the patterns of classroom interactions as 
a function of race and sex of student, sex of 
student and race of teacher, and race and sex 
of teacher, the interaction patterns appear 
to be indistinguishable. 

The results of this study clearly indicated 
that black students and males received a 
greater proportion of the classroom inter- 
actions than did white students or females; 
that both male and female teachers acted in 
very similar ways with male and female 
students; and that there exists the possibility 
of a cross-race effect between white teachers 
and their students, with black students re- 
ceiving more than their fair share of the in- 
teractions. 
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This study investigated the relationships between the quality of a student’s 
group membership and four student outcomes in elementary school class- 
rooms. Jackson’s model of person-group relationships was employed for de- 
scribing a student’s membership relations with his or her classroom peers. In 
addition, this study sought to determine if relationships between group mem- 
bership and student outcomes varied in classrooms with different social struc- 
tures. The data from 621 fifth- and sixth-grade students were treated by 4X 
2 and 2 X 2 X 2 analyses of covariance. The results indicated significant main 
effects for group membership and social structure. It was concluded that 
Jackson’s conceptual model is a useful framework for contributing to an un- 
derstanding of student attitudes and behaviors in elementary school class- 


rooms. 


Historically, much of the research devot- 
ed to understanding the educational process 
in the classroom has focused on the person- 
ality characteristics of the teacher, the be- 
havioral style of the teacher, or the quality 
of teacher-student relationships. Extensive 
reviews of these “teacher effectiveness” 
studies have been provided by Withall and 
Lewis (1963) and Dunkin and Biddle 
(1974). 

This focus on the teacher as the starting 
point for inquiry into the classroom group is 
understandable because the teacher, by po- 
sition, is the most influential figure in the 
classroom. However, this approach fails to 
consider some of the most important realities 
of the classroom setting. Specifically, the 

effects of the classroom peer group! are vir- 
tually ignored in the teacher effectiveness 
research. À 

The present study departs from the usual 
focus on the teacher and is concerned instead 
with the classroom peer group as a salient 
referent group for the attitudes and behav- 


This research is based on a dissertation submitted to 
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express his appreciation to Arthur Blumberg, William 
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iors of individual pupils. Glick's (1968) gr 
mediational model was employed as 1 
theoretical framework for organizing th 
research. 

Glick (1968) looks at the educati 
process in the classroom as composed of 
sets of elements. One set is associated v 
the teacher, the other with individual 
dents. According to the mediational mod 
the teacher's influence on individual s 
dents is mediated by the conditions 
processes of the classroom peer group. 
informal group processes of the classroon 
constitute a set of variables that interv 
between teacher behaviors and individu a 
student outcomes. These intervenin 
variables are the same set of constructs thi 
have been investigated in the field of grou 
dynamics. Examples of these peer grou 
variables are norms, cohesiveness, mem 
bership relations, and social structure. 
the mediational model, the classroom | 
viewed as a complex social system, with th 
teacher as a participant but not as the 80! 
determiner of the outcomes of learning- 

Research stemming from the mediation 
model would focus on the classroom pel 
group as a vehicle whose properties 
processes facilitate or impede the attainn 
of educational outcomes by individual 
pils. The scope of the present study V 


1 [n this research, the classroom peer group 
all of the students within an elementary school 
room. 
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limited to an investigation of the relation- 
" ships between two selected peer group vari- 
ables (group membership and classroom 
social structure) and four student outcome 
variables (achievement in reading, attitudes 
toward school, self-concept as a learner, and 
school-related anxiety). 


Group Membership 


Jackson’s (1959) conceptual model of 
berson-group relationships was employed to 
*-describe the various types of membership 
relations in elementary school classrooms. 
Jackson's conceptual system tries to describe 
a person's psychological relationship to a 
group as opposed to his or her formal mem- 
bership. Jackson thinks that the fact of 
formal membership in itself tells nothing 
about a person's commitment to the values 
and goals of a group or about a group's ex- 
pectations concerning a person's attitudes 
and behaviors. This distinction between 
psychological and formal group membership 
is analogous to the differentiation between 
reference group and membership group 
prevalent in sociological theory (Merton & 
Rossi, 1968). Both concepts allow for the 
possibility of a person being a formal mem- 
ber of a group and at the same time not being 
a psychological group member. 

According to Jackson, the quality of an 
individual's psychological relationship to a 
group may be described by the combined 
effects of the attraction one feels for a group, 
its goals, values, and people; and acceptance, 
the relative clarity of role prescriptions for 
a person in a group.? Jackson coordinates 
these two dimensions of attraction and ac- 
ceptance into an R-space (see Figure 1) to 
describe various types of. person-group re- 
lationships. 

In the present study, group membership 
was examined at four different levels. A 
student's attraction to group membership 
(high or low) and his or her perceived ac- 
ceptance by the group (high or low) differ- 
entiate the quality of the group member- 
ship. 

"The upper right portion of Figure 1 iden- 
tifies individuals who are positively attracted 
to membership in the group and who per- 

ceive that they are accepted in the group. In 
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High Attraction 


Low Acceptance High Acceptance 


Low Attraction 


Figure I. The R-space. 


contrast, the lower left portion of Figure 1 
describes persons who feel that they are not 
accepted within the group and who are neg- 
atively attracted to membership in the 
group. 

The upper left portion of Figure 1 de- 
scribes persons for whom the group is at- 
tractive, but who feel that they are not ac- 
cepted in the group. And the lower right 
portion of Figure 1 identifies persons who 
feel that they are accepted in the group, but 
who do not value membership in the 
group. 


Classroom Social Structure 


Social structure in this research refers to 
the sociometric distribution of friendship in 
the classroom peer group. The sociometric 
distribution of friendship reflects the man- 
ner in which the members of a classroom 
distribute their interpersonal preferences 
with respect to the sociometric criterion of 
friendship. There is some evidence that 
elementary school classrooms differ with 
respect to their social structures. Schmuck 
(1963a, 1963b) delineated two structural 
variables that represent extremes in so- 
ciometric patterning and that can differen- 
tiate elementary school classrooms. His 
ideas center around the centrality or dif- 
fuseness of sociometric choices as mapped by 
asociogram. These two structural types are 
defined by Schmuck (1963b) as follows: 

Centralized structure. This type of 
structure is characterized by narrowly fo- 
cused interpersonal choice. In centralized 
structures, a large number of pupils agree on 
a small number of classmates in a given so- 


? For an expanded description of the person-group 
relationship framework, see Zeichner (1976). 
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ciometric area. With this narrow focus on 
a few children, many children are neglected 
entirely. The most extreme form of a cen- 
tralized structure exists when one child is 
chosen by every other child in a given so- 
ciometric area. 

Diffuse structure. This type of social 
structure may be considered at the opposite 
pole from centralized structures. Diffuse 
structures are distinguished by a more equal 
distribution of sociometric choices. There 
are no distinct subgroups whose members 
receive a large proportion of the preferences. 
Also, there are fewer neglected pupils in 
diffuse structures. The most extreme form 
of a diffuse structure exists when each child 
chooses another child in a circular chain with 
the number of links equal to the class en- 
rollment. 

Actually, of course, classroom social 
structures will fall along a continuum from 
the most centralized to the most diffuse so- 
cial structure. All conditions within these 
two extremes are theoretically possible. In 
the present study, classroom friendship 
structures were rank ordered according to 
their location on the centrality-diffuseness 
continuum and then were split at the median 

into two groups: centralized and diffuse 
classrooms. 


Related Research 


There are very few studies that have uti- 
lized Jackson’s conceptual model in explor- 
ing the relationships between the quality of 
a student’s group membership and his or her 
attitudes and behaviors in the classroom. 
Glick (1969), in a study conducted in 12 
urban sixth-grade classrooms, examined the 
relationship between group membership and 
academic achievement. In this study, per- 
son-group relationships were measured by 
the Syracuse Scale of Social Relations 
(SSSR; Gardner & Thompson, 1956). Glick 
distinguished students on the basis of high 
and low acceptance and attraction and ex- 
amined four categories of group member- 
ship. It was hypothesized that the combi- 
nation of high acceptance and high attrac- 
tion would be the membership relation most 
conducive to academic achievement, Glick 
generally found that the acceptance di- 
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mension of group membership was more 
closely related to academic achieve 
than the attraction dimension. [ 
From the results of this study, it appear: 
that the attraction dimension of gro 
membership makes little difference in 
attainment of educational outcom 
However, there are some weaknesses in 
study, related to the measurement of groi 
membership, that justify further researe 
utilizing Jackson's model. As mention 
previously, the SSSR was used to meas 
the independent variable of group men 
bership. There are two reasons why th 
SSSR is an inappropriate tool for the 
surement of group membership as defini 
by Jackson (1959). ! 
First, in the SSSR, attraction and ag 
tance are measured sociometrically, 
attraction and acceptance scores of 
vidual students summed to produce anj 
dividual's total score. While Jackso 
fines attraction as an individual's attri 
to the goals, values, and people of a 
the SSSR seems to only account for th 
ter condition. Also, Jackson makes it 
that acceptance in his model is some 
other than sociometric acceptance or lil 
Acceptance is defined by Jackson as; 
relative clarity of role prescriptions for 
personinagroup. Glick specifically focust 
on the sociometric liking score of each ind 
vidual. 
Because of these two reasons, the SSS} 
seems to be an inappropriate means M 
measuring Jackson's framework of perso! 
group relationships. It is felt that the ul 
lization of a more appropriate measuri 
instrument will lead to more productive t 
sults with respect to group membershij 
This belief is supported in part by the resul 
of two studies that were conducted in hil 
school classrooms. Felsen (1973) and Fels 
and Blumberg (Note 1), utilizing fact 
analysis, were able to develop a more refi 
instrument for the measurement of grou 
membership (Person-Group Relationshil 
Scale-Student). Their work has importan 
implications for the investigation of 
membership in elementary school clas 
rooms. 
Felsen and Blumberg (Note 1) exam 
the relationship between the nature OF 
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student’s classroom group membership and 
* two student outcomes: the student's anxi- 

ety level in the classroom and his or her at- 

titudes toward the school subject (English). 

The subjects were 150 seniors in high school 

English classes. Felsen and Blumberg found 

that differences existed among students 

concerning their attitudes toward the school 

subject and their anxiety levels in the class- 

room that were related to the relationships 

that students saw themselves holding to the 

classroom peer group. Those students who 
«saw themselves as “psychological members" 
(high attraction — high acceptance) of the 
classroom peer group tended to have more 
favorable attitudes toward school and less 
classroom-related anxiety than students in 
any other membership relation. Felsen and 
Blumberg concluded that the quality of a 
student's group membership seems to be an 
important contributor to the attainment of 
student outcomes. 

Felsen (1973) again investigated the re- 
lationship between group membership and 
selected student outcomes in high school 
classrooms. This time, the analysis was 
expanded to include as dependent variables 
academic progress in science, actual and 
desired effort committed to classroom ac- 
tivities, the discrepancy between actual and 
desired effort, anxiety levels, and attitudes 
toward the school subject (science). 

The subjects for Felsen's study were 598 
juniors and seniors in high school physics 
and chemistry classes. The results generally 
show that there were differences in school 
outcomes that were related to the quality of 
a student's group membership. Again, there 
were trends in the data showing students in 
the category of high acceptance - high at- 
traction ranking higher in the attainment of 
. school outcomes than students in any other 
membership category. 

The findings of Felsen (1973) and Felsen 
" and Blumberg (Note 1) are partially in- 
consistent with those of Glick (1969). On 
one hand, at an elementary school level, 
 Glick was unable to find differences in 

achievement related to attraction. On the 
other hand, the two studies conducted in 
high school classrooms, employing the Per- 
son-Group Relationship Scale—Student, 
Kimply that the application of more refined 
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procedures for the measurement of group 
membership will be able to detect relation- 
ships between attraction and outcomes in 
elementary school classrooms. The present 
study utilized an instrument, similar to the 
one developed by Felsen and Blumberg, and 
examined the relationships between group 
membership and four selected outcomes in 
elementary school classrooms. 

Furthermore, this study sought to deter- 
mine if group membership interacted with 
the centrality-diffuseness of the classroom 
social structure. "The studies of Felsen 
(1973) and Felsen and Blumberg (Note 1) 
imply that the relationships between mem- 
bership and outcomes vary in different set- 
tings. Studies conducted by Schmuck 
(1963a, 1963b), together with certain theo- 
retical notions of Cartwright (1968), suggest 
that the classroom friendship structure is a 
situational variable that interacts with group 
membership. Schmuck's and Cartwright’s 
ideas specifically imply that group mem- 
bership is a more salient variable in diffuse 
classrooms than in centralized class- 
rooms.? 

In summary, the immediate objectives of 
this study were as follows: (a) to determine 
if and how the quality of a student's group 
membership is related to four student out- 
comes in elementary school classrooms and 
(b) to determine if the relationships between 
membership and outcomes vary in class- 
rooms with different social structures. 


3 For an expanded description of the theoretical ra- 
tionale for including classroom social structure as a 
variable in the present research, see Zeichner (1976), 
Briefly, Schmuck (1963a, 1963b) showed that the so- 
ciometric friendship structure of a classroom's peer 
group was associated with the cohesiveness of the 
classroom. Classrooms with diffuse friendship struc- 
tures were found to be more cohesive than classrooms 
with centralized friendship structures. According. to 
Cartwright (1968), who defines cohesiveness as the sum 
of the members’ attraction to a group, high cohesiveness 
in a group increases the significance of membership for 
those who belong to a group. Members of highly co- 
hesive classes will be more concerned with their mem- 
bership relations than members in groups with low 
levels of cohesiveness. In other words, in classrooms 
with diffuse friendship structures (high-cohesive 
classes), the nature of a student's group membership 
should be more closely related to outcomes than in 
classes with centralized friendship structures (low- 
cohesive classes). 
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To assess the quality of a student’s per- 
ceived membership relations with his or her 
classroom peer group, an instrument was 
developed through factor analysis and was 
pilot tested especially for this study. This 
instrument, “My Classmates” (Zeichner, 
Note 2), is an adaptation of the Person- 
Group Relationship Scale (Felsen & 
Blumberg, Note 1), which measures group 
membership in high school classrooms.* 


Method 


Subjects 


The sample consisted of 621 upper elementary (fifth 
and sixth grade) students from 25 classrooms and 4 el- 
ementary schools in Syracuse, New York. This sample 
was selected on the basis of the cooperation of the four 
school principals. In addition, selection was limited to 
students in self-contained nonspecial education class- 
rooms, Table 1 provides information concerning the 
sample, broken down by school, grade, and sex. 


Instrumentation 


Classroom social structure. The distribution of 
friendship choices within the peer groups of the 25 
sample classrooms was assessed by a standard socio- 
metric questionnaire that asked students to indicate the 
three students in their class that they liked the most 
(Fox, Luszki, & Schmuck, 1966). "These data were then 
converted into sociomatrices for each class that indi- 
cated the number of choices received by each student 
in a class and the density of choices within that class. 
‘The relative centrality or diffuseness of a classroom 
friendship structure was determined by an index of 
centrality that was based on the variance of the 
friendship distributions around a choice mean of three 
(Schmuck, 1962). 

Achievement in reading. The total reading scores 
of the California Achievement Test (CAT; Tiegs & 
Clark, 1971) served as a base for measuring a student’s 
level of achievement in reading. The raw scores in vo- 
cabulary and comprehension were summed to provide 
a total reading score for each student. Then, in order 
to control for the three different levels of the CAT that 
were administered to this sample, the total scores were 
converted to Achievement Development Scale scores 
p the use of tables provided by Tiegs and Clark 

Attitudes toward school. A student's attitudes 

toward school were measured by an index, computed 
from a sentence completion test that asked students to 
respond to five stems related to feelings about school 
(Fox et al., 1966; Schmuck, 1962). Pupil responses were 
scored by two raters on a 5-point scale ranging from 
negative to positive, according to the scoring method 
specified by Fox et al. (1966). A student's score for each 
item consisted of the sum of the two scores for that item 
assigned by the two raters. The five individual item 
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Table 1 j; 
Composition of the Subject Sam, le K 
Subject School 
aa 1 2 3 4 Total 
_sample EE 
Sex 
Male 82 Ti 67 86 312 
Female 86 78 13 72 309 
Grade f 
5 89 69 76 89 323 
6 79 86 64 69 298 
Total 168 155 140 158 621 
No. Grades 
5 and 6 
classrooms 7 6 6 6 25 


scores were then summed, and an index was computed 
that was equal to the mean of each student's individual 
item scores, Interrater reliability was computed for 
each item and for the total measure.^ 

Self-concept as a learner. The Self-Esteem Inven- 
tory (SEI; Coopersmith, 1967) attempts to measure 
evaluative attitudes toward the self in social, academic, 
family, and personal areas of experience. It consists of 
58 items and 4 subscales. In the present study, the 
School Academic subscale of the SEI was employed as 
a measure of a student's self-concept asa learner. This 
subscale consists of eight items (e.g., “I am proud of my 
schoolwork") that are answered by a student responding 
"like me" or “not like me.” Data on the reliability of 
the SEI are provided by Coopersmith (1967). 

School-related anxiety. The Test Anxiety Scale for 
Children ((TASC; Sarason et al., 1960) was used to pro- 
vide a measure of a student's level of school anxiety. 
This 36-item questionnaire attempts to measure à 
child's conscious level of anxiety with regard to tests and 
learning tasks performed in the evaluative atmosphere 
of the classroom. Sarason et al. (1960) report test- 
retest reliability at +.67, with a 4-month interval be- 
tween testings. In addition, a series of studies were 
conducted that were designed to reflect on the validity 
of the TASC. These studies are reported in full by 
Sarason et al. (1960) and generally show that the TASC 
is negatively correlated with test and learning perfor- 
mance under most conditions. 


Procedure 


All of the data, except measures of achievement in 
reading, were collected by the researcher and one 


* Fora copy of “My Classmates” and a complete re- 
PSU of the factor analytic study, see Zeichner (Note 

5 Interrater reliabilities were as follows: for items 
1-5, 91, .94, .92, .95, and .95; for total, .96. 
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trained assistant during the months of April and May 


1976. These measures were administered to all stu- 


dents in their regular classrooms and, for the purpose 
of this research, were grouped together into a booklet 
entitled “How I Feel” (Zeichner, 1976). This booklet 
consists of 68 questions and took approximately 30 
minutes to administer. The entire booklet was read 
aloud to all students. The students responded to each 
item as it was read to them. Measures of achievement 
in reading were obtained from standardized tests that 
were administered by the classroom teachers during 
May 1976. 

Six percent (41 students) of the original sample (686) 


__ were absent on the day of testing for their class, and data 


were unavailable for 24 students on the standardized 
‘eading test. All of the hypotheses were tested only for 
those students with complete data (621). 

Efforts were made to administer the “How I Feel” 
booklet under similar conditions in all classrooms 
through the development of standard test administra- 
tion procedures (Zeichner, 1976). In addition, special 


M forms were devised for reporting back to each teacher 


1 


* 


summated class scores on the various dimensions. No 
school authorities had access to any information con- 
cerning specific pupils. 


Design 


The initial design for this study was a 4 X 2 factorial 
with four levels of group membership and two levels of 
classroom social structure. The data were analyzed by 
analysis of covariance with grade and sex employed as 
covariates. Additionally, the data were treated by a 
2X2 X 2 analysis of covariance in an attempt to more 
fully explain the results of the original analysis. The 
initial hypotheses were concerned with the interactions 
between group membership and classroom social 
structure. The outcomes of these initial tests deter- 
mined the examination of simple effects or main effects. 
The hypotheses were tested separately for each of the 
four dependent variables. 


Results 


Characteristics of Group Membership in 
the Sample 


A description of the total sample in terms 
of the numbers and percentages of students 
within each of the four categories of group 
membership is presented in Figure 2. It 
appears that the largest percentage of stu- 
dents fell into Quadrant 2 (high accep- 
tance — high attraction). Next, there are 
those students who fell into Quadrant 1 (low 
acceptance — high attraction). Together, 
Quadrants 1 and 2 accounted for 77% of the 
total sample. Finally, in decreasing order 
are those students who fell into Quadrant 3 
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Attraction 
(High) 


! " 
206 
(33.2%) 


272 
(43.8%) 


Acceptance 
(High) 


Acceptance 
(Low) 


Attraction 
(Low) 


Figure 2. Membership relations in the total sample. 


(low acceptance - low attraction) and into 
Quadrant 4 (high acceptance — low attrac- 
tion). 


Achievement in Reading? 


A summary of the two analyses performed 
on the data from achievement in reading is 
presented in Table 2. The results of these 
analyses indicate that group membership did 
not interact with the centrality—diffuseness 
of a classroom's social structure. Also, none 
of the components of group membership 
interacted individually with social structure. 
However, there was a significant relationship 
between the nature of the classroom social 
structure and a student's level of achieve- 
ment in reading. Specifically, students in 
diffuse classrooms achieved higher in read- 
ing than students in centralized class- 
rooms. 

In addition, there was a significant rela- 
tionship between the acceptance dimension 
of group membership and reading achieve- 
ment. Students with high acceptance in 


6 The least-squares solution that was used to perform 
both of the data analyses estimated the effects of the 
components in the following order: (a) covariates 
(grade and sex), (b) main effects, and (c) interactions. 
These analyses were performed using the default option 
in the statistical package for the social sciences analysis 
of variance program (Nie et al., 1975). 

7 In this research, .05 was used as the critical level for 
determining statistical significance. 

8 The three components of group membership are (a) 
acceptance, (b) attraction, and (c) Acceptance X At- 
traction. 
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Table 2 F 
Summary of tħe Results for Achievement 
in Reading 
Direc- 
Variable dí  F p tion? 
4 X 2 analysis 
Grade (A) 1 6272 .01 
Sex (B) 1- "37 n4 
Group membership (C) 3 230 ns 
Classroom social 
structure (D) 1 976 .01 
cxD 3 %6 ns 
Residual 611 
2X 2X 2 analysis 
Grade (A) 1 62.72 .01 Positive 
Sex (B) Yo Wie» 
Acceptance (C) 1 672 .01 Positive 
Attraction (D) 1 167 ns 
Classroom social 
structure (E) 1 9.76 .01 Negative 
CXD 1 008 ns 
CXE 1 88 ns 
DXE tlk one 
CXDXE 1.76 ns 
Residual 611 


* "The data were coded as follows: For grade, fifth = 0 and sixth 
= 1; for sex, male = 0 and female = 1; for acceptance and at- 
traction, low = 0 and high = 1; and for classroom social struc- 
ture, diffuse = 0 and centralized = 1. 


their classroom peer groups achieved higher 
in reading than students with low accep- 
tance. On the other hand, the main effects 
for group membership and attraction and 
the Acceptance X Attraction interaction 
were not significant. 


Attitudes Toward School 


A summary of the two analyses performed 
on the data from attitudes toward school is 
presented in Table 3. These results indicate 
that group membership did not interact with 
the centrality-diffuseness of a classroom’s 
social structure. Also, none of the compo- 
nents of group membership interacted in- 
dividually with social structure. However 
there was a significant relationship between 
the nature of the classroom social structure 
and a student’s attitudes toward school. 


‘dents with low attraction. 
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Students in centralized classrooms had more 
positive attitudes toward school than stu-} 
dents in diffuse classrooms. ; 

Also, the quality of a student’s group 
membership was significantly related to 
or her attitude toward school. The 2 x9» 
2 analysis indicates a significant main effect 
for the attraction dimension of group mem- 
bership. Students who had high attraction 
to their classroom peer groups had more 
positive attitudes toward school than stu- 
On the other 
hand, the main effects for acceptance andj 
the Acceptance X Attraction interaction 
were not significant. 


Self-Concept as a Learner 


A summary of the two analyses performe 
on the data from self-concept as a learner i$ 


Table 3 
Summary of the Results for Attitudes 
Toward School 


Direc- 
Variable d F p  tion* 
4 X 2 analysis 
Grade (A) | 1 3.80 ns 
Sex (B) 1 478 .05 
Group membership (C) 3 1036 .01 
Classroom social 
structure (D) 1 1478 .01 
CXD 3 à .8 ns 
Residual 611 
2X 2X 2 analysis 
Grade (A) 1 3.80 ns 
Sex (B) 1 478 .05 Positiv 
Acceptance (C) ILIO ns 
Attraction (D) 1 22.08 .01 Positiv 
Classroom social 
Structure (E) 1 1478 .01 Positiv 
CxD 18 1.12 na 
CXE 1 47 ns 
DXE 1 34 ns 
CXDXE 1 35 ns 
Residual 611 


2 The data were coded as follows: For grade, fifth = 0 and 
= 1; for sex, male = 0 and female = 1; for acceptance and 
traction, low = 0 and high = 1; and for classroom social 
ture, diffuse = 0 and centralized = 1. 


Variable df F 


4X 2 analysis 


1 267 ns 
«Se 1 86 ns 
Group membership (C) 3 2231 .01 
Classroom social 
— structure (D) 1 .001 ns 
DXD 3 44 ns 
Residual 611 
2X2 X 2 analysis 
Grade (A) 1 (2.67 ns 
Sex (B) 1 86 ns 
Acceptance (C) 1 39.21 .01 Positive 
Attraction (D) 1 7.71 .01 Positive 
Classroom social 
- Structure (E) 1 .001 ns 
CxD 1 03 ns 
EC xE 1 50 ns 
L 35 ns 
1 41 ns 
611 


f The data were coded as follows: For grade, fifth = 0 and sixth 
= 1; for sex, male = 0 and female = 1; for acceptance and at- 
traction, low = 0 and high = 1; and for classroom social struc- 
ture, diffuse = 0 and centralized = 1. 


presented in Table 4. These results indicate 
that group membership did not interact with 
the centrality-diffuseness of a classroom’s 
_ Social structure. Also, none of the compo- 
ents of group membership interacted with 
Social structure. 
However, there was a relationship between 
- the quality of a student's group membership 
. and his or her self-concept as a learner. The 
2X 2 X 2 analysis indicates significant main 
effects for the acceptance and attraction 
"dimensions of group membership. Students 
who had high acceptance in their classroom 
peer groups had more positive self-concepts 
_ than students with low acceptance. Also, 
Students with high attraction to their class- 
Toom peer groups had more positive self- 
concepts than students with low attraction. 
On the other hand, the main effects for 
classroom social structure and the Accep- 
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tance X Attraction interaction were not 
significant. 


School-Related Anxiety 


A summary of the two analyses performed 
on the data from school-related anxiety is 
presented in Table 5. These results indicate 
that group membership did not interact with 
the centrality-diffuseness of a classroom's 
social structure. Also, none of the compo- 
nents of group membership interacted in- 
dividually with social structure. 

However, there was a signficant relation- 
ship between the quality of a student's group 
membership and his or her level of school- 
related anxiety. The 2 X 2 X 2 analysis in- 
dicates a significant main effect for the ac- 
ceptance dimension of group membership. 
Students with high acceptance in their 
classroom peer groups had lower levels of 
school-related anxiety than students with 


Table 5 
Summary of the Results for School-Related 


Anxiety 


Direc- 
Variable df F p tion? 
4 X 2analysis 

Grade (A) 1 .55 ns 
Sex (B) 125.70 .01 
Group membership (C) 3 883 .01 
Classroom social 

structure (D) 1 138 ns 
CXD 3 34 ns 

Residual 611 

2 X 2 X 2 analysis 

Grade (A) 1 .55 ns 
Sex (B) 125.70 .01 Positive 
Acceptance (C) 129.95 .01 Negative 
Attraction (D) 1 125 ns 
Classroom social 

structure (E) 1 138 ns 
CXD 1 .02 ns 
CXE 1 .15 ns 
DXE 1 100 ns 
CXDXE 1 .03 ns 

Residual 611 


The data were coded as follows: For grade, fifth = 0 and sixth 
= 1; for sex, male = 0 and female = 1; for acceptance and at- 
traction, low = 0 and high = 1; and for classroom social struc- 
ture, diffuse = 0 and centralized = 1. 
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low acceptance. On the other hand, the 
main effects for attraction and classroom 
social structure and the Acceptance X At- 
traction interaction were not significant. 


Discussion 


Although the expected interactions be- 
tween group membership and classroom 
social structure failed to materialize, the 
results of the present study contribute new 
information to the literature on classroom 
research. The quality of a student’s group 
membership has been shown to be a salient 
variable for the attainment of individual 
student outcomes in elementary school 
classrooms. The results of the present study 
seem to indicate that the quality of a stu- 
dent’s membership in his or her classroom 
peer group is related to the student’s atti- 
tudes toward school, self-concept as a 
learner, and school-related anxiety. These 
results are consistent with the findings of 
Felsen (1973) and Felsen and Blumberg 
(Note 1). Jackson's (1959) conceptual 
model of person-group relationships appears 
to be a useful framework for contributing to 
an understanding of student attitudes and 
behaviors in both upper elementary and high 
school classrooms. 

In addition, when the effects of the com- 
ponents of group membership were consid- 
ered individually, the nature of the rela- 
tionships between membership and out- 
comes was illuminated. 'The clarity of role 
prescriptions for a person in a group (ac- 
ceptance) is suggestive of security in the peer 
group. The results of the present study in- 
dicate that the clarity of role prescriptions 
was significantly related to several student 
outcome variables. Students who perceived 
high acceptance in their classroom peer 
groups had more positive self-concepts 
higher reading achievement scores and 
lower levels of school-related anxiety than 
ee with low acceptance, 

. e results are consistent with th - 
ings of Kahn, Wolfe, Quinn, and ne 
(1964) concerning the effects of role ambi- 
guity in industrial organizations. The ab- 
sence of a feeling of Security in one’s peer 
group seems to be detrimental to emotional 
health and task performance both in the 
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factory and in the classroom. In additio 
the results of the present study with regan 
to the acceptance-reading-achievement r 
lationship are consistent with the findings 
Glick (1969). One's perceived acceptance | 
the classroom peer group and not one’s a 
traction to group membership seems to E 
the critical factor for achievement. 

On the other hand, the attraction dime 
sion of group membership was significant 
related to attitudes toward school and sei. 
concept as a learner. Students with hig] 
attraction to their classroom peer groups ha 
more positive self-concepts than those stu 
dents with low attraction. Also, high al 
traction to the classroom peer group see ! 
to extend beyond the peer group to feeling 
about school in general. 

"The problem at hand, accounting for th 
attitudes and behaviors of students, is ex 
ceedingly complex because of the nature o 
the classroom as a social organism. No sin 
gle approach to research in this area can b 
expected to account for all of the variance ii 
student outcomes. However, given that th 
huge bulk of classroom research has focuse! 
almost exclusively on the teacher, th 
present study seems to indicate that the in 
clusion of peer group variables along wit 
teacher variables would be a productive are 
for future research. Such an approacl 
would enable the explanation of more vari 
ance than would be possible by a focus ot 
either set of variables alone. Also, a statis 
tical partitioning could then be made of thi 
relative impact of these two sets of variable 
on various affective and cognitive studed 
outcomes. The present study, with its lim: 
ited emphasis on the classroom peer group 
has tentatively shown that group member 
ship is a peer group variable worthy of con! 
sideration for such research. ) 

It is suggested that when group member, 
ship is employed as a variable in future re 
search that the effects of attraction, accep 
tance, and the Acceptance x Attraction in 
teraction should be examined separatelyn 
Although the present study indicated tha 
the quality of group membership was relatet 
to several student outcomes, more informáy 
tion was gained by examining the compo 
nents of group membership individually? 
For example, the relationships betwee! 


embership and two variables, school-re- 
ted anxiety and attitudes toward school, 
ere discovered to be a result of only one of 
ole two dimensions of group membership. 
o, as this study has shown with respect to 
ding achievement, it is possible that a 
mponent of group membership could be 
related to a student outcome even though 
1e overall quality of group membership is 
salient. 
Phe results of the present study have di- 
sect relevance for classroom teachers and for 
ose involved in the professional prepara- 
jon and continuing in-service education of 
teachers. If, as the findings of the present 
tudy indicate, the quality of a student's 
oup membership is related to his or her 
motional health and academic perfor- 
mance, teachers should actively attempt to 
?romote positive membership relations in 
their classrooms. Furthermore, fostering 
dupil security (acceptance) in the peer group 
seems to be an especially important issue for 
gachers. 
Along these lines, teacher educators 
ould include in the professional prepara- 
n of preservice tedchers and in the con- 
uing education of in-service teachers 
mponents that are specifically concerned 
vith the social dynamics of classroom 
oups. Teachers should learn in the course 
f their professional studies how to diagnose 
he informal dynamics of student peer 
roups and how to develop strategies for in- 
uencing these processes in ways that are 
neficial for the emotional and academic 
wth of students. 
: Group processes, whether acknowledged 
not, are operating within all classrooms. 
is felt that an understanding of these in- 
rmal properties gives the teacher greater 
ontrol of the classroom situation and pro- 
ides a greater opportunity for meeting the 
eeds of individual pupils. As Waller stated 
| 1932, 


believe that all teachers, great and small, have need of 
sight into the social realities of school life, that they 
rish as teachers for lack of it. Young teachers fail 
'cause they do not know how to keep order. Brilliant 
ecialists do their jobs poorly because they do not 
derstand the human nature of classrooms, Teacher 
aining has done much to improve the general run of 
struction, but it can do vastly more if it equips be- 
nning teachers with social insight. (page v) 
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It is my belief that significant improvements 
in the quality of education will only occur if 
attention is paid to the social dynamics of 
classroom groups. This is not to say that 
teaching techniques and curriculum mate- 
rials are unimportant. On the contrary, an 
understanding of the social dynamics of 
classrooms should enable teachers to make 
more intelligent use of these techniques and 
materials. The present study seems to bear 
out the notion that the attainment of inter- 
personal satisfactions within the classroom 
peer group by individual pupils is an im- 
portant issue for consideration by classroom 
teachers. 
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According to a recent review, classroom 
reward structures refer to the performance 
contingencies, criteria, or standards students 
must satisfy in order to receive presumably 
valued consequences, such as prizes or high 

E (Michaels, 1977, p. 87). Although not 
1 


trials, receiving performance feedback 


were made based on the findings. 


explicit in that review, the major purpose of 
explicit reward structures is to strengthen 
task performance, particularly where effort 
and performance might otherwise below. In 
other words, reward structures provide in- 
centives for higher performance. 
The four basic reward structures most 
frequently examined in laboratory and 
classroom settings are individual and group 
\ reward contingencies and individual and 
group competition. The former two are as- 
Sociated with the reinforcement approach to 
_ structuring rewards, whereas the latter two 
. are associated with the competition-coop- 
eration approach. The review also describes 
how each reward structure is typically op- 
erationalized (Michaels, 1977, pp. 88-89). 
The first basic distinction is made in terms 
of the performing unit, individual or group, 
to which the reward structure applies. The 
Second basic distinction is made in terms of 
the reward independence among units under 


Requests for reprints should be sent to James W. 
Michaels, Department of Sociology, Virginia Poly- 
technic Institute and State University, Blacksburg, 
Virginia 24061. 
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Effects of Differential Rewarding and Sex 
on Math Performance 


James W. Michaels 
Department of Sociology 
Virginia Polytechnic Institute and State University 


In an attempt to merge the reinforcement and competition approaches and 
test the relative efficacy of reward structures, two levels of differential group 
rewarding were paired with three levels of differential rewarding within 
groups to form six reward structure treatments. College students worked on 
math problems under the reward structures for a series of performance-pay 


after each trial. The math perfor- 


mance of females was consistent with the differential rewarding hypotheses, 
whereas that of males was consistent with an alternative spontaneous compe- 
tition hypothesis. Suggestions for reducing the sex gap in math performance 


reward contingencies versus the negative 
reward interdependence among units in 
competition. Thus, under reward con- 
tingencies, the probability or magnitude of 
rewards for one unit is unrelated to those for 
other units because the performance of each 
unit is evaluated relative to a previously 
determined criterion. In contrast, under 
competition, the probability or magnitude 
of rewards is negatively related among units 
because the performance of each unit is 
evaluated relative to the performance of 
other units, and rewards are allocated ac- 
cordingly. 

Although the reinforcement and compe- 
tition-cooperation literatures with few ex- 
ceptions remain separate, a brief review of 
representative comparisons may help dis- 
tinguish the settled from the unsettled 
issues. 

The massive and rapidly expanding re- 
search literatures in behavior modification 
document the effectiveness of both indi- 
vidual and group reward contingencies in 
strengthening performance in schools (e.g., 
see reviews by Altman & Linton, 1971; 
O'Leary & Drabman, 1971). Although the 
relative effectiveness of individual versus 
group reward contingencies has been the 
focus of many studies (e.g., see the review by 
Litow & Pumroy, 1975), for the most part, 
the target behaviors addressed have been 
those believed to be prerequisites for learn- 
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ing (e.g., attendance and attending to task) 
rather than actual academic learning and 
task performance. The few comparisons 
that have addressed actual learning and 
performance have found individual and 
group contingencies to be about equally ef- 
fective (Hamblin, Hathaway, & Wodarski, 
1971; Hammond & Goldman, 1961; Herman 
& Tramontana, 1971). 
Miller and Hamblin (1963) resolved a 
substantial issue involving individual com- 
petition when their review found individual 
competition to be generally more effective 
for strengthening performance on individual 
tasks but group competition more effective 
for strengthening group performance on in- 
terdependent or cooperative tasks. Al- 
though this interpretation is consistent with 
the conclusions of more recent reviews 
(Johnson & Johnson, 1974; Michaels, 1977), 
some inconsistencies remain. For example, 
individual competition has also been found 
to be more effective than group competition 
for both types of tasks (Julian & Perry, 1967; 
Weinstein & Holzbach, 1972), and the su- 
periority of individual competition has been 
found not to hold when there was concurrent 
group competition (Goldman, Stockbauer, 
& McAuliffe, 1976). 
à Because the reinforcement and competi- 
tion approaches have for the most part re- 
mained separate, very few studies have ex- 
amined the relative effectiveness of reward 
contingencies and competition. Julian and 
Perry (1967) found individual competition 
to be most effective, group competition in- 
termediate, and a group reward contingency 
least. effective in strengthening both quantity 
and quality of laboratory exercises. Scott 
and Cherrington (1974) found individual 
competition to be more effective in 
strengthening test scoring performance than 
group competition and an individual reward 
contingency. Hammond and Goldman 
(1961), however, found no significant per- 
formance differences on problem-solvin, 
tasks among all four reward structu: , 
res. 
Due to the dearth of studies comparing 
reward contingencies and competition, as 
well as inconsistencies among those com- 
parisons that have been made, conclusions 
regarding the relative effectiveness of reward 
contingencies and competition remain ten- 
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uous. However, this has not precluded in- 
vestigators from emphasizing the advantages 
of their own preferred approach or reward 
structure. Because inconsistent findings 
across studies can be due to differences in 
the specific operationalizations of the reward 
structures (including type and magnitude of 
rewards used and specific reward proce- 
dures), the types of tasks to which the reward , 
structures apply, and how performance is 
measured, any investigator can find support 
for a particular preferred reward structure 
or approach and rationalize exceptions. 
This is a fortunate situation if we only want 
to be able to find support for our own biases, 
but it is an unfortunate one if we seek a more 
satisfactory understanding of reward struc- 
tures. 

A fair test of the efficacy of reward struc: | 
tures would appear to require the following 
reformulation and procedures. First, re- 
ward structures should be conceptualized 
and operationalized on the basis of an oper- 
ation common to both the reinforcement and 
competition approaches. Second, the re- 
sulting operationalizations of reward struc- 
tures should be consistent with their re- 
spective traditional conceptualizations. 
Third, the same type and approximate 
magnitude of rewards should be made 
available under each operationalized reward ' 
structure. Fourth, in order for the reward 
structures to have an effect on subsequent 
performance, performers should operate 
under the reward structures for a series of 
performance-reward trials rather than à 
single trial. Finally, the task and perfor 
mance measures should be identical for all 
operationalized reward structures. The 
following formulation and procedures ex 
plain how these conditions were met in the 
present study. 


Formulation 
Reward Structure Treatments 


Because both approaches differentially 
reward individuals, groups, or both, the 
concept of differential rewarding was cho 
sen as the basis for conceptually and oper: 
tionally linking the reinforcement an 
competition approaches. 4 
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x Table 1 
Group and Individual Reward Conditions and Composite Reward Structure Treatments 
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Individual rewarding within groups 


Group rewarding Nondifferential Moderate differential High differential 
Nondifferential group Noncontingent Moderate individual High individual 
rewarding rewarding competition; individual competition: 
rewards moderately individual rewards 
negatively related. highly negatively 
related. 


-A 


 MDIR, and HDIR 


Group reward 
contingency: 
individual rewards 
highly positively 
related; group piece 
rate. 


Differential group 
rewarding 


Group reward 
contingency and high 
individual 
competition: 
individual rewards 
more negatively 
related than 
positively related, 


Individual reward 
contingency: individual 
rewards positively 
related = individual 
rewards negatively 
related; individual piece 
rate. 


Two levels of differential group rewarding 
were paired with three levels of differential 
rewarding within groups to form the six 
composite reward structures used as treat- 
ments. The two levels of differential group 
rewarding were nondifferential group re- 
warding (NDGR) and differential group re- 
warding (DGR). In the NDGR condition, 
groups were paid identically on each trial 
regardless of group performance (i.e., fixed 
or noncontingent rewarding of groups); 
whereas in the DGR condition, groups were 
paid in direct proportion to group perfor- 
mance (i.e., group piece rate). 

The three levels of differential rewarding 
of individuals within groups were nondif- 
ferential, moderate-differential, and high- 
differential individual rewarding (NDIR, 
respectively). In the 
NDIR condition, group pay was allocated 
equally to group members on each trial re- 
gardless of the differences in individual 
Performance. In the MDIR condition, 
group pay was allocated to group members 
in direct proportion to relative performance 
on each trial. Thus, if one dyad member 
contributed 60% of the total group perfor- 
mance on a particular trial, that member 
Teceived 60% of the total group pay for that 
trial. In the HDIR condition, the higher 
performer on each trial received 75% of the 
total group pay. 

NDGR and DGR were each paired with 
NDIR, MDIR, and HDIR to form six com- 
Posite reward structure treatments. The six 


reward structure treatments, along with 
their traditional labels, are shown in Table 
1. By design, five of the reward structures 
should be familiar. These are noncontin- 
gent rewarding (NDGR-NDIR), moderate 
individual competition (NDGR-MDIR), 
high individual competition (NDGR- 
HDIR), group reward contingency (DGR- 
NDIR), and individual reward contingency 
(DGR-MDIR). These five reward struc- 
tures constitute those most frequently ap- 
plied and discussed in the literature. In the 
remaining reward structure (DGR-HDIR), 

the performers’ rewards are both positively 

related (due to the group contingency of 
DGR) and negatively related (due to the 

75%-25% split of group rewards of HDIR) 

but more negatively than positively related 

(i.e., a performer's earnings will generally be 

less a function of group performance than of 

whether or not the person outperformed the 

other). The present design does not include 

a cell for competition between groups. 


Differential Rewarding Hypotheses 


The following hypotheses relating indi- 
vidual performance to differential rewarding 
were tested. 

Hypothesis 1. Individual performance 
will vary directly with differential group re- 
warding, being higher in the differential 
group rewarding (DGR) condition than in 
the nondifferential group rewarding 
(NDGR) condition. 
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Hypothesis 2. Individual performance 
will vary directly with differential rewarding 


within groups, being (a) higher in the mod- 
erate-differential individual rewarding 
(MDIR) condition than in the nondifferen- 
tial individual rewarding (NDIR) condition 
and (b) higher in the high-differential indi- 
vidual rewarding (HDIR) condition than in 
the moderate-differential individual re- 
warding (MDIR) condition. Hypothesis 2b 
is admittedly a severe test of the differential 
rewarding hypothesis because relevant re- 
search usually only compares differential to 
nondifferential rewarding. Thus, hypoth- 
esis 2b addresses the question of whether 
more severe (high) differential individual 
rewarding will strengthen performance sig- 
nificantly more than less severe (moderate) 
differential individual rewarding. 
Hypotheses 1 and 2 predict main effects 
of differential group and individual re- 
warding, respectively, but they say nothing 
about the relative effectiveness of the six 
composite reward structures that serve as 
treatments. Should any reward structure be 
more or less effective than others indepen- 
dent of the predicted differential rewarding 
main effects, this would be indicated by a 
significant interaction effect involving dif- 
ferential group and individual rewarding. 
Only if this occurs, should separate paired 
comparison tests of the reward structures be 
made. However, no such interaction effects 
were expected. Instead, because the reward 
structure treatments were composed of dif- 
ferent combinations of nondifferential and 
differential group and individual rewarding, 
the main effects of differential rewarding 
were expected to account for any differential 
effectiveness of reward structures. Unfor- 
tunately, this prediction assumes the awk- 
ward status of a null hypothesis in that fail- 
ure to find a significant group by individual 
differential rewarding effect would not be 
sufficient reason for asserting that there is 
none (Bakan, 1970). This is because the 
power of the test is in rejecting, rather than 
failing to reject, anull hypothesis. Thus, if 
no such interaction effect is indicated all 
that can be concluded is that there is no ev- 
idence that any of the reward structure 
treatments are differentially effective inde- 


pendent of group and individual differential 
rewarding. 
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Task, Feedback, and Sex Considerations 


The type of task selected, procedures that 
included immediate performance feedback 
after each trial, and a design balanced by sex 
raised an additional consideration. Because 
it was desirable to select a task for which 
objective measures of performance could be 
quickly determined, sets of three-step math 
problems were constructed to be used as the 
task. Based on the extensive review of ac- 
ademically related sex differences by Mac- 
coby and Jacklin (1974), males were ex- 
pected to outperform females on the math 
task. Thesame review suggests males may 
be more competitive than females in settings 
where competitiveness produces greater in- 
dividual rewards. The presence of perfor- 
mance feedback after each trial was expected 
to strengthen this effect, particularly since 
math is typically regarded as a male sex- 
typed subject by both sexes (Fennema & 
Sherman, 1977). Thus, regardless of reward 
structure, performance feedback after each 
trial, particularly on a male sex-typed task, 
may provide a stronger stimulus for spon- 
taneous competition for males than for fe- 
males. If this is the case, a motivation ceil- 
ing effect may occur for males, reducing the 
extent to which motivation and performance 


can be further strengthened by differential ' 


rewarding. 

The following additional hypotheses weré 
derived from the previous considerations. 

Hypothesis 3. The math performance of 
males will be higher than that of females. 

Hypothesis 4. The math performance of 
females will show greater responsiveness to 
differential group and individual rewarding 
than will that of males. 


Method ' 
Subjects 


There were 162 students enrolled in summer sessi 
courses at a state university who served as subject 
Volunteers were recruited from classrooms on the bas 
of the opportunity to earn a variable amount of mon! 
($4.00 to $12.00) for 1 hour's participation in a P% 
systems experiment and were guaranteed no decepti 
and no aversive stimulation. Under the constraints 
Same race, sex, and time availability, subjects "5, 
randomly assigned to dyads, and dyads were random! 
assigned to reward structure treatments. The 


E 
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* pleted design contained data on 144 subjects, 12 white 
males and 12 white females in each of the six reward 
structure treatments.! 


Procedures 


Upon arrival, subjects were seated on the same side 

of a long table separated by a partition. At their posi- 

_tions were typed instructions, problem sheets, answer 

=% sheets, and pencils. A male experimenter was posi- 

tioned at a second table approximately 8 feet (2.4 m) in 

front of the subjects.? At his position were a tape re- 

corder with recorded instructions, problem answer keys, 

a performance-pay matrix, results sheets, a stopwatch, 
and money for paying subjects. 

Subjects were asked to read their copies of the in- 
structions as the taped version was being played. The 
instructions informed subjects that the purpose of the 
experiment was to find out how people performed on 
math problems under different pay systems. The in- 

y structions then described the pay system they would be 
working on by first describing how the group pay would 
be determined (i.e., whether group pay was fixed for 
each trial or would vary directly with group perfor- 
mance) and then describing how the group pay would 
be allocated between the two of them (i.e., whether 
equally regardless of individual performances, pro- 
portionate to their performances relative to one another, 
or 75% going to the higher performer). The instructions 
also explained how the math problems were to be 
worked and informed subjects that there would be 11 
separate performance-pay trials. At the end of the 
instructions, the experimenter offered to answer any 
questions the subjects might have regarding the pay 
System, the math problems, or the procedures to be 
followed. 'T'he work session began after both subjects 
expressed complete understanding of the instructions 
and procedures. 

On each of 11 performance-pay trials, subjects 
worked for 2 minutes on a different set of 32 three-step 
math problems similar to those used in previous re- 
search on motivation by Raynor and Rubin (1971) and 


Table 2 
Analysis of Variance Summary on 
Performance 

Source df MS F 

1 
Sex (A) 1 52.35 4.31* 
Group rewarding (B) 1 9.41 pn 
Individual rewarding (C) — 2 54.78 4.51** 
AXB 1 18104  1078*** 
AxC 2 16.93 1.39 
BxC 2 6.91 57 
AXBXC 2 29.01 2.39 
Biror 7.007 182i i1216 SE 
* p < .05. 
** p < 025. 
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Entin and Raynor (1973). The experimenter called 
time at the end of 2 minutes, collected and scored 
subjects’ answer sheets, and provided each subject with 
a results sheet. The results sheet informed each subject 
of the group’s, the subject’s own, and the other subject’s 
performance (i.e., number of problems correct) and pay 
for the trial. Subjects were then paid the amounts in- 
dicated on the results sheet. 

On each trial, dyads assigned to the NDGR condition 
received $1.20, allocated between subjects according to 
assigned individual rewarding condition (described 
previously); whereas dyads assigned to the DGR con- 
dition received $.04 for each problem, also allocated 
according to assigned individual rewarding condition. 
Group pay was allocated equally between dyad mem- 
bers on the few occasions that performance ties occurred 
under the high-differential individual rewarding con- 
dition. This occurred on fewer than 10% of the 
trials. 


Results 


Before considering the performance re- 
sults, a brief consideration of differences in 
subjects’ earnings is in order. The rate of 
$.04 for each correct problem in the DGR 
condition was set on the basis of pretest 
findings to insure that the mean magnitudes 
of pay in the two group rewarding conditions 
were similar (mean = $11.30 per dyad). The 
mean magnitudes of individual pay in the 
NDIR, MDIR, and HDIR conditions were 
$5.48, $5.69, and $5.78, respectively, but the 
mean pay difference between subjects in 
each dyad ranged from zero in the NDIR 
condition to $2.89 in the HDIR condition. 
The mean total pay for all subjects was 
$5.65. 

Performance measures consisting of the 
numbers of problems completed correctly 
were analyzed with a 2 X 2 X 3 factorial 
analysis of variance design for effects due to 
sex, differential group rewarding, and dif- 


1 The data for eight black female dyads were deleted 
because the number of black male volunteers was in- 
sufficient to complete an experimental design balanced 
by race and sex. The data for one mixed-race male 
dyad were also deleted. 

2 The conclusion of a recent review of research on the 
effects of sex of experimenter on the task performance 
of subjects suggests the performance of both sexes, 
particularly that of females, may be enhanced by a male 
rather than female experimenter (Rumenik, Capasso, 
& Hendrick, 1977). However, the conclusions were 
drawn from studies involving verbal learning rather 
than mathematical tasks, and several studies found no 
such effects. 
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Table 3 : 
Analyses of Variance Summaries on. 


Performance 


Source df MS F 
Males 
Group rewarding (A) 1 10535 6.73** 
Individual rewarding (B) 2 5.41 35 
AXB 2 9.20 59 
Error 66 15.66 
Females 
Group rewarding (A) 1 3949 445* 
Individual rewarding (B) 2 68.49 elo 
AXB 2 2453 2.76 
Error 66 8.87 
* p « 05. 
** p € 01. 
*** p «001. 


ferential individual rewarding within groups. 
The results are summarized in Table 2. 
Because the analysis of variance indicated a 
main effect due to sex (males outperforming 
females), F (1, 132) = 4.31, p < .05, and a 
Sex X Group Rewarding interaction effect, 
F(2, 132) = 10.70, p < .001, separate 2 X 3 
analyses of variance were run for males and 
females. The results of the separate analy- 
ses are summarized in Table 3.3 

Hypothesis 1 predicted performance 
would be higher in the DGR condition than 
inthe NDGR condition. Hypothesis 1 was 
supported only for females, F(1, 66) — 4.45, 
p < 05. In fact, the performance of males 
was higher in the NDGR condition than in 
the DGR, F(1, 66) = 6.73, p < .025. Thus, 
the math performance of females varied di- 
rectly with differential group rewarding, but 
the performance of males was higher when 
group rewards were fixed. The mean math 
performance of males and females in each of 
the reward structure treatments are shown 
in Table 4. 

Hypothesis 2 predicted performance 
would vary directly with differential indi- 
vidual rewarding within groups. Table 3 
indicates a main effect, due to differential 
individual rewarding for females, F(2, 66) 
= 7.72, p < 01, but not for males, F(2, 66) 
= .35. The specific comparisons testing 
Hypotheses 2a and 2b for females indicated 
that performance was higher in the MDIR 
condition than in the NDIR, £(66) = 2.27, p 
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< .05, but not significantly higher in the 
HDIR condition than in the MDIR, t (66) = 
1.58. Thus, although a main effect of dif- 
ferential individual rewarding was indicated 
for females, more severe differential re- 
warding (HDIR) was not significantly more 
effective than less severe differential re- 
warding (MDIR). 

Finally, as indicated by the absence of ay 
Group Rewarding X Individual Rewarding: 
interaction effect for either sex, there is no 
evidence that any of the six reward structure 
treatments were differentially effective in: 
dependent of the main effects of differentia 
rewarding. The absence of such an inter 
action effect also renders separate paired 
comparisons of the treatments inappro 
priate. 

As previously indicated, Hypothesis 4% 
predicting higher performance for male 
than females, was supported, F(1, 132) = 
4.31, p < .05. However, the mean perfor 
mance of males and females was similar 
under both the DGR condition (114.29 and 
121.99, respectively) and the HDIR condi: 
tion (132.33 and 131.45, respectively). 

Hypothesis 4 predicted that the perfol 
mance of females would be more responsivi 
to differential group and individual ré 
warding than would that of males. Becaus 
the performance of females varied directi 
with both forms of differential rewarding 
while that of males did not, Hypothesis 4 wå 
also supported. 


Discussion 


Only the math performance of femalé 
responded to both group and individus 
differential rewarding as predicted by dil 
ferential rewarding Hypotheses 1 and 4 
However, high-differential individual ™ 
warding was not found to be significans 
more effective than moderate-differentl 
rewarding. Furthermore, there was no €. 


3 Additional analyses on number of problems od 
pleted (rather than number correct) and by trial blo 
indicated effects identical to those reported here. 
correlation between number of problems completed ti 
correct was 91, and the only difference across trials vt 
significantly lower performance on the first tha" $ 
subsequent trials. 
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Table 4 
Mean Performance and Standard Deviation by 
Treatment and Sex 
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Composite Reward Structure 


Individual rewarding within groups 
Sex Nondifferential Moderate differential High differential 
Nondifferential group rewarding 
Males 
M 131.56 149.71 141.46 
SD 53.98 55.36 31.85 
Females 
M 83.49 99.66 136.84 
SD 27.53 41.67 29.57 
Differential group rewarding 
Males 
M 112.42 107.36 123.20 
SD 34.64 32.64 34.20 
Females 
M 106.70 133.32 126.06 
SD 32.95 28.57 23.99 


dence that any of the six reward structure 
treatments were differentially effective in 
strengthening performance independent of 
the main effects of the two forms of differ- 
ential rewarding. 

The performance of males did not respond 
directly to either form of differential re- 
warding. Although the effects of differential 
individual rewarding were at least in the 
predicted direction, males actually per- 
formed higher in the nondifferential group 
rewarding (NDGR) condition than in the 
differential group rewarding (DGR) condi- 
tion. Although this unexpected finding is 
not interpretable from the explicit differ- 
ential rewarding formulation, it appears to 
be consistent with the hypothesized spon- 
taneous competition effect for males (Hy- 
pothesis 4). Specifically, pay was fixed (i.e., 
Scarce) in the NDGR condition, whereas a 
group reward contingency applied in the 
DGR condition. The group reward contin- 
gency, a cooperative structure according to 
Deutsch (1949), probably provided less of a 
Stimulus for competition than fixed group 
pay. The finding is also consistent with our 
understanding of behavior consequences as 
aclass. That is, the NDGR condition still 
offered the prepotent consequence of “win- 
ning” or outperforming the other. Itisim- 


portant to understand that “rewards” or 
consequences as a class may be differentially 
effective for different individuals, for the 
same individuals at different times, in dif- 
ferent situations, or with different others 
present. This interpretation, however, must 
be regarded as speculative until the findings 
are replicated under a design that also 
manipulates performance feedback (present 
and absent). Finally, it is also possible that 
randomization failed to equalize math ability 
in the NDGR and DGR conditions (n = 36 
in each condtion). 

Differential individual rewarding within 
groups had no significant effect on the math 
performance of males. This is contrary to 
Hypothesis 2 (from the differential reward- 
ing formulation) but is consistent with Hy- 
pothesis 4 (from the spontaneous competi- 
tion formulation). Thus, immediate per- 
formance feedback after each trial appar- 
ently provided a stronger stimulus for 
spontaneous competition for males, pro- 
ducing generally high motivation and per- 
formance regardless of how group pay was 
allocated between performers. Thus, 
beating the other person constituted a po- 
tential source of rewards across all reward 
structure treatments. 

One may wonder why subjects generally 
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performed well even when their pay was m 
no way contingent on their performance (ie., 
in the NDGR-NDIR treatment). Three 
factors may have operated to maintain a 
reasonable level of performance, even though 
pay remained constant regardless of per- 
formance. First, as already indicated, per- 
formance feedback after each trial may have 
offered some incentive to do well, either in 
relation to the other subject or in relation to 
one's own past performance. Second, 
subjects may not have believed that they 
would actually be paid the same amount 
even if they did nothing Third, just as in 
other work settings, subjects may have be- 
lieved it only fair to perform as long as they 
were being paid. Taking these consider- 
ations into account with students as subjects, 
we should expect a reasonable level of per- 
formance in work settings of this type, even 
when the material rewards administered are 
not directly or immediately contingent on 
level of performance achieved. 

It may be worthy of note that previous 
research on reward structures has not found 
(or at least reported) similar sex by treat- 
ment interaction effects, possibly because 
the methods and/or analyses did not yield 
such effects. Thus, the interaction effects 
of the present study may be due to the 
unique combination of the male sex-typed 
task, immediate performance feedback after 
each trial, multiple performance trials, 
and/or the inclusion of sex as a factor in the 
statistical analysis. However, because the 
present findings are unique in the reward 
structure literature, more careful examina- 
tion of the interaction is advised. For ex- 
ample, the apparent spontaneous competi- 
tion effect for males would be more ade- 
quately tested in a design that varied per- 
formance feedback (absent or present) as 
well as differential rewarding. 

The present findings may have consider- 
able implications for reducing the sex gap in 
math performance. Consistent with previ- 
ous findings (e.g., see Maccoby & Jacklin, 
1974), the math performance of males was 
generally higher than that of females. 
However, the math performance of females 
equaled that of males under both differential 
group rewarding and high-differential indi- 
vidual rewarding within groups. This 
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finding suggests the sex gap in math per- 
formance may be reduced by operationaliz- 
ing more explicit differential group or indi- 
vidual rewarding within classrooms, partic- 
ularly when performance feedback applies. 
Although a case may be made that scores on 
math tests constitute both feedback and 
differential rewarding, unless these scores 
are associated with immediate valued re- 
wards as in the present study, the effects of = 
math scores may be more closely approxi- 
mating those of feedback—a condition that 
may provide stronger performance incentive 
for males than for females. 

Although the present study operational- 
ized and tested reward structures on the © 
basis of differential rewarding, the findings, 
consistent with previous reviews (Miller & 
Hamblin, 1963; Michaels, 1977), lend some 
support for the efficacy of competition for 
strengthening individual performance on 
academic tasks. The support derives from 
the main effect of differential individual 
rewarding (i.e., competition in this case) for 
females and the apparent spontaneous 
competition for males. 

Finally, generalizability of the present 
findings to the typical classroom setting may 
be restricted by the unique combination of | 
factors in the present study. These factors | 
include the use of college students 3$ | 
subjects, a single work session in a laboratory 
setting, a math task with a time constraint, | 
and monetary incentives. 


4 Yet, all subjects communicated an understanding | 
of the pay structure they would be working under | 
Although much lower performance was anticipated inj 
the noncontingent NDGR-NDIR treatment, only onè | 
subject performed poorly—and he at least reflected 


ee mind by missing each problem by a singl? | 
igit. | 
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Predictive Validity of Readiness Tests for Middle and 


Lower Socioeconomic 


Mexican 


Thomas 


Status Anglo, Black, and 
American Children 


Oakland 


University of Texas at Austin 


This study determined the predictive 


validity of six tests of academic readi- 


ness for Anglo, black, and Mexican American first-grade children from middle 
and lower socioeconomic status (SES) homes. Criterion variables consisted 


of measures of intelligence at second 


fourth grades. Seventy-seven percent of 
significant for the total sample. However, 


tween the three racial-ethnic groups 


tests tended to be most valid for Anglos and least valid for blacks. 


grade and achievement at second and 
the correlations were statistically 
importarit differences existed be- 
and two social classes. The readiness 
Within 


each racial-ethnic group, the tests tended to be more valid for middle than for 


lower SES children. 


Readiness measures commonly are used 
by schools to assess children's scholastic 
aptitudes in order to form homogeneous 
groups, to identify children who are less able 
to engage in formal academic instruction, 
and to identify those skills and abilities that 
need to be improved through remediation or 
compensatory education programs. In 
general, readiness tests are designed to 
measure current performance levels in areas 
important to children's subsequent academic 
development (i.e., to predict later achieve- 
ment). 

Among the various psychometric charac- 
teristics „that must be considered in 
standardizing a test, validity perhaps is the 
most important (e.g, Davis, 1974). Re- 
cently, the need to determine a test’s validity 
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has been underscored through federal leg- 
islation (Public Laws 93-380 and 94-142) and 
litigation (Oakland & Laosa, 1977) that re- 
quire schools to develop assessment pro- 
grams that are nondiscriminatory with re- 
spect to racial-ethnic bias and to demon- 
strate the accuracy of a test’s use (Bersoff, 
1973; Cronbach, 1975). Thus, school sys- 
tems that use readiness measures are being 
strongly encouraged to validate tests for use 
with children from various racial-ethnic and 
socioeconomic status (SES) groups. 

Considerable literature exists that exam- 
ines the predictive validity of scholastic ap- 
titude measures for black adolescents and 
adults entering colleges and universities (e.g 
Cleary, Humphreys, Kendrick, & Wesman, 
1975; Stanley, 1971). However, data that 
examine the predictive validity of various 
standardized readiness measures with both 
Anglo and minority-group children entering 
elementary schools could not be found. 
This is surprising in light of the widespread 
use of these measures. 

Within the last 5 years, a number of psY- 
chometrically based culture-fair models of 
assessment have been proposed (Petersen & 
Novick, 1976) that presumably enable us to 
use test results in a more fair and equitable 
fashion with minority-group persons. Basi¢ 
to all models is the need to establish a mea” 
sure’s criterion validity for the separate 
groups for whom regression equations a! 


ighi s T 
Copyright 1978 by the American Psychological Association, Inc. 0022-0663/18/7004-0574$00.75 


574 


VALIDITY OF READINESS TESTS 


K Table 1 


Sample Size for the Six Groups on the 
Metropolitan Readiness Tests (MRT) and 


Three Dependent Variables 


Group MRT MAT CTMM CAT 
Anglo 
Middle SES 65 43 28 31 
Lower SES 60 43 26 35 
Mexican American 
Middle SES 59 41 32 36 
Lower SES 80 65 50 59 
Black 
Middle SES 64 52 41 40 
Lower SES 83 65 45 50 
Total 411 309 223 251 
Note. SES = socioeconomic status, MAT = Metropolitan 


Achievement Tests, CTMM = California Test of Mental Ma- 
turity, and CAT = California Achievement Tests. 


developed. The lack of research literature 
with minority group pupils—together with 
a lack of separate normative data for them 
in test manuals—currently precludes their 
general use. 

The purposes of this study, then, are to 
determine the predictive validity of six 
measures of school readiness for a repre- 
sentative group of children entering first 
grade and to compare the validity coeffi- 
cients for children from middle and lower 
SES backgrounds and from three racial- 
ethnic groups (Anglo, black, and Mexican 
American). 


Method 


Subjects 


There were 411 middle- and lower-class Anglo, black, 
and Mexican American children initially entering first 
grade who were chosen for this study. The children 
were selected from 18 elementary schools in one school 
district (of 55,000 pupils) in which we obtained the 
principals’ permission to test. Approximately one third 
of the pupils in these schools were selected for the 
sample. A stratified sampling design was employed to 
acquire both middle- and lower-class children from 
three racial-ethnic groups. The sample initially in- 
cluded more lower-class than middle-class children, 
anticipating a higher attrition rate among lower-class 
children. Socioeconomic status was determined on the 
basis of father’s occupation or, if the father was absent 
from the home, mother’s occupation, using the Warner, 
Meeker, Eels (1949) scale. Six groups were chosen: 
middle-class Anglos (n = 65) and lower-class Anglos (n 
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= 60), middle-class blacks (n = 64) and lower-class 
blacks (n = 83), and middle-class Mexican Americans 


(n = 59) and lower-class Mexican Americans (n = 
80). 
Procedure 


The Metropolitan Readiness Tests (MRT; Hildreth, 
Griffiths, & McGauvran, 1964) were administered to all 
children by personnel from the local school district; 
these data were acquired from the test files. Five ad- 
ditional readiness tests also were administered during 
the first 3 months of Grade 1: the Language and Con- 
cepts tests from the Tests of Basic Experiences (Moss, 
1970), the Clymer-Barrett Prereading Battery (Clymer 
& Barrett, 1968), and the Slosson Intelligence Test and 
the Slosson Oral Reading Test (Slosson, 1963). These 
tests were chosen to include those that are most fre- 
quently used by school personnel (e.g., the MRT and the 
Clymer-Barrett Prereading Battery), those that were 
developed for use with minority children (e.g., Tests of 
Basic Experiences), those that frequently are used by 
school and clinical psychologists in diagnosing learning 
problems (e.g., the Slosson Intelligence Test and Slosson 
Oral Reading Test). All are designed and used to pre- 
dict how well children will do academically, principally 
in reading and math. Except for the Slosson Intelli- 
gence Test, their focus is limited to the primary grades. 
Data from the Metropolitan Achievement Tests (Durost 
et al., 1958) and the California Test of Mental Maturity 
(Sullivan, Clark, & Tiegs, 1962), administered in the 
spring of second grade, and the California Achievement 
Tests (Tiegs & Clark, 1970), administered in the spring 
of fourth grade, were obtained from the school dis- 
trict. 

Table 1 contains the numbers of children included on 
the major tests over 4 years. As expected, attrition 
occurred during this 4-year interval. In most cases, the 
sample size and their respective standard deviations 
were sufficiently large to warrant the use of correlations 
with data from the total sample and for the three ra- 
cial-ethnic groups. The sample sizes for the six 
subgroups occasionally are insufficient, in which case 
the data are not reported (see Table 2). Relationships 
between the readiness measures and later tests of 
achievement and intelligence we obtained through 
Pearson product-moment correlations. 


Results 


Correlations Between the Metropolitan 
Readiness Tests and Later Measures of 
Achievement and Intelligence 


Metropolitan Achievement Tests 
(MAT). Correlations between the MRT 
total score and the MAT (acquired at second 
grade) for the total sample (see Table 2) are 
statistically significant for Reading (.43) and 
Arithmetic (.44). For middle- and lower- 
class Anglo children, the correlations are 
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Means and Standard Dev 
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e Readiness Tests and Their Correlations with Later 
nce for Three Racial-Ethnic and Two Socioeconomic 


Status Groups 
Metropolitan California California Test 
Achievement Achievement of Mental 
MEN TG E NS Toss Maturity 
Reading Arith- Reading Math Language Non-Lan- Tota 
Group M SD total metic total total IQ guage IQ 
Metropolitan Readiness Tests: total score 
"Total dac" dase AB** .38** alba 142v* 
Total Anglo 60.1 17.5 gee  .82**  .62** — .er* 52** .68** 
Middle class 646 17.7 gree .88** — .739**—.06** ,59** .63** 
Lower class 553 15.9 ‘63** .80**  .45%* .54** A3* noo" * 
Total Mexican American 517 16.0 -30* 19 Aot* — apte It irt 
Middle class 55.3 16.7 .43* 24 A3* A6** .48** .45** 
Lower class 49.0 148 01 06 34" 22 04 16 
"Total black 487 16.9 15 .39* MIU o pot .23* 20" 
Middle class §2.9 17.0 21 ,g8** . .d9** 38" ute 44** 
Lower class 45.8 16.0 18 15 .23 .08. .09 15 
‘Tests of Basic Experiences: Language and Concepts 
d ee :39** — 24* .96** 32** AT A5** 
‘ol lo 37.0 69 AT* ‘Ste es 
Middle class 364 76 70° [M i i3 E 
Lower class 376 60 9 49 29 —.04 
e American — 309 8.2 88* E 37 AT Al* .50** 
class 325 86 t E 
51 13 67 43 67** 70** 
Lower class 294 75 05 E j 
Pick 1 n 3 E .09 51 23 .32 
k 320 68 27 27 10 
Middle class ED QUA aom e k 
Lower class 38 72 339 47 -20 9 à En 
Todd "Tests of "oe des Language 
"Total Anglo E ‘ 03 .03 30** i91** 
Middle class. T M ue " De Mee 62** 63** 
Lower class 900 30 22 56 -26 -0 
Total Mexican American — 181 9.5 13 -07 A a 
Middle class 176 53 pr ^ m .08 .39* 45** 
Lower class. 185 123 —02 Bl 5 :66* .63* .68** 
‘Total black 182 52 . TH EL. au 39 48 
Middle class ELS o» &mo-0 al 
Lower class aT -p 01 15 05  —.08 12 
^ AT —.35 —.04 40 Al 
Total Tests of Pac Bosnae: Concepts 
Total Anglo 17.9 38 pon BBS 138th 22 4A4** 38** 
Middle class DR M ue Mw M 36 
Lower class 176 34 r. yrs ; 
Total Mexican American 14.2 4.0 5 45  —28  —.04 
Middle class n EE Or com ^ ` “A 
15 48 42 26 43 55 
wer class 137 37 2 04 10** —.06 .62* .65** 
Total black VENUS 97-05. 80" ^ 84 45. 1 
Middle class 3 28 40 08 
Lower cl. oos 08 35 * TA 23 m 
clase MCN Lo 2: 36... 35 Al 
G -410 02 31 22 
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Metropolitan California California Test 
Achievement Achievement of Mental 
Tests Tests Maturity 
Reading Arith- Reading Math Language Non-Lan- Total 
Group M SD total metic total total IQ guage IQ IQ 
Clymer-Barrett Prereading Battery: total score 
»4 Total .21* St .22* A3 .03 AL .00 
Total Anglo Vias 8:0082** 5 .80f*  .80** .,.87** .68** ehe .80** 
Middle class 80.0 27.5 .53* QUARE TRIES dg 
Lower class 743 179 &ri( fat B e e i Cada id? -719** 61* AP d 
"Total Mexican American 73.0 24.6 -38* BOSTER S0: .89* 24 .38* 83 
Middle class 77.0 23.0 
Lower class 71.2 25.1 .44* 48** — 36 Bl 15 29 .22 
"Total black 840 23.0 48** 13 23 AT .06 04 -.01 
Middle class 90.7 18.4 .53* .54* .25 15 .39 .65** .62** 
Lower class 77.2 25.0 ,55** .12 .29 .30 14 —.06 —412 
i Slosson Intelligence Test 
Total Batter AAi ACEP. 1.) SE8* .70** .67** oem 
Total Anglo 104.7 17.4 .46* d Ee 67** 27 .60** 4t Vs 
Middle class 114.9 14.3 .59* 05 .59* 48 .65* 
Lower class 93.9 13.6 404. -« .70* Vibe 54 
Total Mexican American 91.2 15.8 .60** — .44* Al phe NOAH tsi -76** -15** ,80** 
Middle class 98.3 13.3 59 21 798% — 59* ,92** .61* 89% 
Lower class 844 15.1 .56* .35 .66** —.59* .56* 89^ Taget 
Total black 96.1 138  —07 A7 2 .19 .65** EM 57** 
Middle class 1014 15.3 =07 08 apre .56* -70** 
Lower class 909 9.5 32 .59** — .26 .35 .69** Al ,61* 
Slosson Oral Reading Test: total score + 
Total 4A4** 29° .98** — .35** .40** 8746.1 7,808" 
Total Anglo 87 6.5 .66** 31 19 —.03 —.14 .28 .00 
Middle class 117 641 13) 5/10 .38 A3 45 
Lower class 51 541 .63* .07 -04 —.18 
Total Mexican American 9.8 13.4 gut) ach. BESS .59* .26 44 
Middle class 11.8 16.8 66** — .63* 
Lower class 77 82 qi q38 
Total black 67 59 14 -16 .23 18 .55** .44* 49** 
Middle class 86 61 31 28 Aet .52* 61** 
Lower class 49 50 .32 .35 14 12 A9 49 .51* 
* p € .05. 
** p «01. 


both relatively consistent and exceedingly 
high. The correlations for children from 
other groups, however, tend to be less con- 
sistent and lower; all correlations for lower- 
class black and Mexican American children 
are insignificant statistically. 

California Achievement Tests (CAT). 
Correlations between the MRT total score 
and the CAT (acquired at fourth grade) for 
the total sample are statistically significant 
(and comparable to those for the MAT) for 


Reading (.48) and Math (.38). Data on the 
three racial-ethnic groups indicate that the 
correlations for Anglos again are consistent 
and high, those for Mexican Americans are 
lower, and those for blacks are lowest but 
still significant statistically. Within each 
racial-ethnic group, the correlations for 
middle-class children are higher than those 
for lower-class children. MRT total score 
generally does not effectively predict 
fourth-grade Reading and Math scores for 
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lower-class Mexican American and black 
children. t 
California Test of Mental Maturity 
(CTMM). Correlations between the MRT 
total score and the CTMM acquired at sec- 
ond grade for the total sample are statisti- 
cally significant for Language (.37), Non- 
Language (.42), and total (.22). A general 
pattern exists in the correlations for children 
from three racial-ethnic groups: They are 
highest for Anglo children; while for Mexican 
American and black children, they are sig- 
nificant but at lower levels of magnitude. 
All MRT-CTMM correlations are signifi- 
cant for middle-class children but are not 
significant for lower-class Mexican American 
and black children. Therefore, the use of 
the MRT to predict later performance on the 
CTMM may be most accurate for Anglo 
children and moderately accurate for mid- 
dle-class Mexican American and black chil- 
dren; predictions for lower-class black and 
Mexican American children are not accu- 
rate. 


Correlations Between the Tests of Basic 
Experiences (TOBE) and Later Measures 
of Achievement and Intelligence 


MAT. Five of the six correlations be- 
tween the MAT Reading and Arithmetic and 
TOBE Language, Concepts, and their com- 
bined scores (Language and Concepts) are 
statistically significant for the entire sample. 
Significant correlations are apparent for 
Anglo children and largely limited to mid- 
dle-class Anglo children; scores from the 
Concepts and the combined Language and 
Concepts tests for middle-class Mexican 
American children also significantly predict 
their MAT Reading total scores. "hus, the 
TOBE is a strong predictor of MAT perfor- 
mance for middle-class Anglos but generally 
does not significantly predict achievement, 
as measured by the MAT, for lower-class 
Anglo children and most groups of minority 
children in the second grade. 

CAT. Correlations between the TOBE 
Language and Concepts and the CAT 
Reading and Math tests are statistically 
significant for the entire sample; for the two 
separate TOBE subtests, only the Con- 
cepts-CAT Reading total correlation is sig- 
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nificant. Among the 36 correlations from 
the separate SES X Racial-Ethnic Groups, 
only 6 are significant, 4 of which are for 
middle-class Mexican Americans and 2 for 
middle-class blacks. In general, the TOBE 
does not accurately and consistently predict 
CAT in fourth grade; at best, it may be 
somewhat useful in predicting reading 
achievement of middle-class Mexican 
Americans and blacks. 

CTMM. The three TOBE scores corre- 
late significantly with CTMM Language and 
Non-Language IQs but not with total IQ; the 
TOBE Language and Concepts test is a 
slightly better predictor than the two sepa- 
rate scales. The TOBE appears to be a 
particularly powerful predictor of Language, 
Non-Language, and total IQs for Mexican 
American middle-class children but is not an 
effective predictor for black children. 


Correlations Between the Clymer-Barrett 
Prereading Battery (CB) and Later 
Measures of Achievement and 
Intelligence 


MAT. Correlations between the CB total 
score and the MAT total Reading (.21) and 
Arithmetic (.19) test scores are low but sta- 
tistically significant for the entire sample. 
Five of the six correlations for the racial- 
ethnic groups are significant, with higher 
correlations for the Anglo children than for 
black and Mexican American children. 
Nine of the 10 correlations between the CB 
and MAT Reading and Arithmetic Tests for 
the SES X Racial-Ethnic Groups are sig- 
nificant; only the CB-MAT Arithmetic test 
correlation for lower-class black children is 
not significant. 

CAT. The correlation between the CB 
and the CAT Reading (.22) is statistically 
significant, while the CB-MAT Math total 
correlation (.13) is not significant for the 
total sample. The CB appears to be ex- 
ceedingly accurate in predicting fourth- 
grade achievement of Anglo children and to 
a much lower degree for Mexican American 
children but is inaccurate for black children. 
Thus, for predicting fourth-grade C^ 
performance, the CB is highly accurate fot 
Anglos, less accurate for Mexican Americans 
and inaccurate for blacks. 


» 
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CTMM. Correlations between the CB 
total and the CTMM are not significant for 
the entire sample. Correlations for the three 
racial-ethnic groups are consistently high 
only for Anglo children. The CB strongly 
predicts Non-Language and total IQs for 
middle-class blacks and moderately predicts 
Non-Language IQs for Mexican Ameri- 
cans. 


Correlations Between the Slosson 
Intelligence Test (SIT) and Later 
Measures of Achievement and 
Intelligence 


MAT. The correlations between the SIT 
and MAT Reading (.32) and Arithmetic (.42) 
test scores are statistically significant for the 
entire sample. Likewise, the correlations are 
highly significant for the Anglo and Mexican 
American groups, although they are very low 
for the total black sample. Data from the 
SES x Racial-Ethnic Groups indicate that 
the SIT IQ correlates most significantly with 
Arithmetic test scores for lower-class Anglo 
and black children. 

CAT. Correlations between the SIT IQ 
and CAT for the entire sample are highly 
significant for Reading (.66) and Math (.52). 
Correlations are consistently high for mid- 
dle- and lower-class Mexican Americans on 
both Reading and Math. The SIT signifi- 
cantly predicts CAT Reading (but not Math) 
for middle- and lower-class Anglos; it pre- 
dicts neither for blacks. 

CTMM. The most striking set of corre- 
lations exists between the SIT and the 
CTMM IQs. The correlations are signifi- 
cant for the entire sample as well as for most 
of the SES X Racial-Ethnic Groups. The 
best predictor of IQ in the second grade is the 
SIT IQ score in the first grade. 


Correlations Between the Slosson Oral 
Reading Test (SORT) and Later 
Measures of Achievement and 
Intelligence 


MAT. Correlations between the SORT 
and the MAT Reading (.44) and Arithmetic 
(.29) tests are significant for the total sample. 
As expected, the SORT correlates higher 
with Reading than Math. The SORT is 
particularly useful in predicting achievement 
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in both Reading and Math at second grade 
for Mexican American children and Reading 
for Anglo children. The SORT-MAT cor- 
relations for blacks are not significant. 

CAT. Similar relationships generally 
hold for fourth-grade achievement. The 
SORT significantly predicts Reading (.38) 
and Math (.35) achievement test scores for 
the entire sample. However, its accuracy in 
predicting achievement for the racial-ethnic 
groups is limited to that of Mexican Ameri- 
can children. 

CTMM. The SORT correlates signifi- 
cantly with the CTMM for the entire sample 
and most consistently and accurately for 
blacks. Correlations reported for other ra- 
cial-ethnic groups generally are not signifi- 
cant. 


Discussion 
Achievement and Intelligence 


Achievement at second grade. Several 
trends are apparent in the data. Six of the 
seven readiness measures correlate signifi- 
cantly with both Reading and Math 
achievement test scores at the second grade 
as measured by the Metropolitan Achieve- 
ment Tests for the total group. While 9396 
of these correlations are significant, the 
predictive validity coefficients are quite 
different for the various readiness measures. 
The best predictor of second-grade 
achievement scores on the Metropolitan 
Achievement Tests is the Metropolitan 
Readiness Tests, while the predictive va- 
lidity of the Clymer-Barrett Prereading 
Battery and the Tests of Basic Experiences 
are considerably lower and more variable. 
Among the seven readiness tests, the median 
correlations for predicting Reading scores is 
.39 and for predicting Math scores is .26 at 
second grade. Thus, the measures tend to 


` be somewhat more effective in predicting 


subsequent achievement in reading than in 
math. 

Achievement at fourth grade. Five of the 
seven readiness measures correlate signifi- 
cantly with Reading and Math test scores as 
measured by the California Achievement 
Tests at Grade 4. The validity coefficients 
vary more widely among the tests at fourth 
grade than at second grade. The correla- 
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tions between the Slosson Intelligence Test 
and the California Achievement Tests have 
increased 34 points for Reading and 20 
points for Math, representing a sizable im- 
provement in this test’s effectiveness to 
predict achievement from second to fourth 
grade. The Metropolitan Readiness Tests 
also remain a relatively strong predictor at 
fourth grade. This indicates that the Met- 
ropolitan Readiness Tests not only corre- 
spond well with their companion measure of 
achievement (ie, the Metropolitan 
Achievement Tests) but also with a non- 
companion measure. The median correla- 
tions with Reading (.36) and Math (.32) test 
scores among the seven measures are roughly 
comparable to those found at second grade. 
It may be of interest to note that there is a 
perfect rank-order correlation between the 
tests’ ability to predict reading and math at 
the fourth-grade level. 

Intelligence at second grade. The read- 
iness tests differ significantly in their pre- 
dicting intelligence at the second grade, 
ranging from .73 to .00 (with a median cor- 
relation of .14). Three tests appear most 
promising in this regard. The Slosson In- 
telligence Test and the Metropolitan Read- 
iness Tests also correlate significantly with 
the California Test of Mental Maturity, 
particularly for black children. The Met- 
ropolitan Readiness Tests also correlate 
S boe ptu with this measure, 
wi of the correlations being signi 
at the .01 level. paputcant 


Racial-Ethnic Groups 


The predictive validity of readiness mea- 
sures tends to be highest for Anglo Eildrer 
then for Mexican American children and 
lowest for black children. Support for this 
generalization is provided by considering the 
number of significant correlations obtained 
within the three racial-ethnic groups as well 
as the tendency for one racial-ethnic group 
to consistently have higher correlations. If 
the tests were unbiased, the number of si - 
nificant correlations would be Blproximataf, 
the same for each of the three groups. The 

number of significant correlations is 35 for 
Anglo children (24%), 30 for Mexican 
American children (20%), and 13 (996) for 
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black children. These figures strongly 
suggest that as a group, the readiness mea- 
sures tend to be more valid for Anglos than 
for blacks. 

Another sign of test bias comes from con- 
sidering the tendency for one racial-ethnic 
group to consistently have higher correla- 
tions than the other two groups. The 
number of times each of the three groups is 
ranked first (e.g., has the highest correlation) 
in the set of three correlations should be 
approximately the same if the tests are un- 
biased. Among the 49 correlations, Anglos 
have the highest correlation on 32 (65%), 
Mexican Americans on 15 (31%), and blacks 
on 2 (4%). 

SES differences within racial-ethnic 
groups.! Within each racial-ethnic group, 
the predictive validity coefficients clearly 
indicate that these readiness measures tend 
to have greater predictive validity for mid- 
dle-class than for lower-class children. 
Among the 83 coefficients that are statisti- 
cally significant, 60 (7296) favor middle-class 
children. Among the 19 validity coefficients 
that are negative, 14 (74%) are with lower- 
class children. 

Additional evidence of possible bias comes 
from determining the number of correlations 
that are higher for either middle-class or 
lower-class children. Among the 100 sets of 
validity coefficients, 72 are higher for mid- 
dle-class children, 24 are higher for lower- 
class children, and 4 are the same. This 
general trend is apparent within each of the 
three racial-ethnic groups: For Mexican 
Americans, 8296 of the coefficients are higher 
for middle-class children; for Anglos, 7596 are 
higher for middle-class children; and for 


blacks, 7096 are higher for middle-class 
children. 


Making Decisions on Individual Children 


, A simple and direct measure of the effi- 
ciency in using readiness measures to predict 


1 The confidence one can have in the data is in pa't 
a function of the size of the sample. As the numbers 
within each cell decrease, the interpretation should be 
more cautious. The strongest interpretations can : 
made from data representing the total group, the P c 
from each of the three racial-ethnic groups, and the next 
from the six SES X Racial-Ethnic Groups. 
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*» later performance can be obtained by 


squaring the correlations in Table 2; r? pro- 
vides an estimate of the relative reduction in 
error achieved through the use of the test. If 
one accepts the relative reduction in error of 
49% as a realistic goal for employing a test, 
a correlation =.70 becomes the criterion to 
use in examining the data to determine the 
- relative effectiveness of these measures in 
making predictive decisions regarding indi- 
vidual children. Only 296 of the correlations 
for the total sample are 2.70. However, if 
one accepts the relative reduction in error of 
2596 (and, correspondingly, a correlation of 
2.50 as the criterion), then 10% of the cor- 
relations for the total group meet this crite- 
rion. The percentage of correlations that 
meet this criterion for the subgroups is as 
follows: total Anglos, 6596; middle-class 
Anglos, 7496; lower-class Anglos, 5096; total 
Mexican Americans, 3396; middle-class 
Mexican Americans, 6596; lower-class Mex- 
ican Americans, 2096; total blacks, 896; mid- 
ar blacks, 26%; and lower-class blacks, 

96. 

In general, these data suggest that indi- 
vidual decisions based on these test data 
typically should be made with care. Fur- 
thermore, individual decisions tend to be 
warranted more frequently with middle- 
class than with lower-class children, more 
frequently with Anglos, less frequently with 
Mexican Americans, and least frequently 
with blacks. 

An alternate explanation for the lower 
coefficients among low SES minority chil- 
dren is that school interventions have a dif- 
ferent impact on these children; educational 
programs may act to redistribute the or- 
dering of their achievement. "This expla- 
nation would hold that the scores from the 
readiness measures may be a valid reflection 
of children's abilities at the beginning of 
School and that long-term predictive validity 
cannot be determined accurately because 
multiple educational interventions affect 
children in different ways. However, there 
presently is little evidence for the differential 
effectiveness of educational interventions 
provided by public schools for children from 
different SES groups (Coleman et al., 1966). 
Also, the fact that some readiness measures 
are good predictors of later performance for 


581 


children from each Racial-Ethnic X SES 
Group severely weakens this alternative 
explanation. 


Conclusions 


These results generally suggest a number 
of points. The predictive validity of readi- 
ness measures may need to be examined 
separately for subpopulations. One cannot 
assume that the validity coefficients for the 
total population adequately represent the 
validity coefficients for persons from various 
racial-ethnic or SES groups. The tests’ va- 
lidity coefficients frequently are sufficiently 
high to justify their use in making group 
decisions. However, the coefficients tend to 
be too small to be used justifiably to make 
individual decisions, particularly with mi- 
nority-group children. 

A related point deserves note even though 
it goes beyond the scope of this study: In- 
adequate attention to racial-ethnic and SES 
characteristics often occurs in standardizing 
tests and in reporting their norms. The 
presently accepted procedure of drawing a 
standardization sample that proportionately 
represents selected demographic charac- 
teristics may yield a representative sample 
of persons nationally but does not ade- 
quately provide for different characteristics 
displayed by some subgroups. Years ago, 
Davis (1948) reported that significant 
numbers of items from existing aptitude 
tests differentially discriminated between 
middle-class and lower-class Anglo children. 
More recently, Green (Note 1) reported that 
between 1596 and 7096 of the items from a 
major achievement battery evidenced bias 
when the performance of northern urban 
blacks, southern rural blacks, and south- 
western Mexican Americans were contrasted 
with that of Anglos. 

During the recent revision of the Wechsler 
Intelligence Scale for Children-Revised, the 
standardization sample was drawn to rep- 
resent the relative proportion of Anglos, 
blacks, and other nonwhite children within 
the United States. Among the 2,200 chil- 
dren included in the standardization sample 
were 305 blacks. While the proportionate 
sampling procedures may accurately sample 
bodies, the practical effect of including rel- 
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atively few minority children is that their 
unique characteristics tend to have a minor 
influence on the prevailing characteristics 
manifested by the majority group. The 
characteristics of the majority group clearly 
tend to dominate. Furthermore, the num- 
ber of blacks included in the standardization 
sample is too small for separate black norms. 
"Thus, in standardizing a test, consideration 
should be given to drawing a larger propor- 
tion of children from minority and lower 
SES groups in order to provide a larger data 
base, thus enabling psychologists to report 
detailed norms and estimates of reliability 
and validity for racial-ethnic and SES groups 
and to more adequately examine racial- 
ethnic and SES biases. 


Reference Note 


1. Green, D. Racial and ethnic bias in test instruction 
(Tech. Rep.). Monterey, Calif: California Test 
Bureau/McGraw-Hill (undated). 
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Pictures, Imagery, and Retarded Children's 
Prose Learning 


Bruce G. Bender and Joel R. Levin 


University of Wisconsin—Madison 


Ninety-six educable mental retardates, ages 10 to 16 years, were randomly as- 
signed to one of four experimental conditions to listen to a 20-sentence story. 
Picture subjects viewed illustrations of the story, imagery subjects were in- 
structed to generate mental pictures of the story, repetition control subjects 


heard each sentence of the story twice, 
the story once. Planned comparisons 


and control subjects simply listened to 
revealed that picture subjects recalled 


more story information than did subjects in all other groups. Differences 
among the other conditions, age by conditions interactions, and age differ- 
ences per se were not statistically significant. A number of theoretically and 
practically interesting issues are discussed in the context of recent prose learn- 


ing findings with normal children. 


The competent learner, when faced with 
a learning task, will usually first assess its 
demands and then engage in an appropriate 
information-processing strategy. The re- 
tarded learner, in contrast, is a much more 
passive participant, often failing to think 
about the material in any meaningful way 
that will facilitate memory for its content. 
Brown (1974) proposed that normal-re- 
tardate learning differences can frequently 
be traced to the failure of retardates to en- 
gage in any strategic behavior. She offers 


evidence, however, that educable mentally . 


retarded (EMR) adolescents can learn at 
near-normal levels when instructed in the 
use of a strategy to mediate their learning—a 
finding that is consistent with the notion 
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that retardates suffer from a production 
deficiency (Flavell, 1970). 

Other investigators have demonstrated 
that the learning of EMR subjects increases 
significantly when they are provided with an 
appropriate information-processing strategy. 
These studies have focused primarily on 
paired-associate learning, demonstrating 
that EMRs can improve their learning of 
to-be-associated items when supplied with 
sentence mediators (Turnure & Walsh, 
1971), when provided with pictures of pairs 
of objects interacting (Milgram & Riedel, 
1969), and when instructed to generate an 
interactive mental image of separate objects 
or pictures (Lebrato & Ellis, 1974; Yarmey 
& Bowen, 1972). 

Although the effects of mediational 
strategies have been examined with respect 
to the learning of arbitrary associates, little 
is known about the potential facilitation of 
retardates’ recall of an actual story when a 
mediational strategy is introduced. As far 
as normal children are concerned, experi- 
menter-provided pictures and subject-gen- 
erated imagery have both been found to 
improve prose learning performance. 
However, whereas pictures are known to 
facilitate the prose learning of normal chil- 
dren at all grade levels including kinder- 
garten (Levin & Lesgold, in press), the abil- 
ity to benefit from a mental imagery strategy 
seems to be developmental in nature. In 
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particular, not until about third grade can 
normal children successfully employ an im- 
agery strategy to improve their prose learn- 
ing, unless special techniques are devised 
(Dunham & Levin, in press; Guttmann, 
Levin, & Pressley, 1977; Shimron, 1974; 
Ruch & Levin, Note 1). This is in contrast 
to the paired-associate results, where chil- 
dren about 2 or 3 years younger benefit from 
an imagery strategy (Levin, 1976). 

Lesgold, Levin, Shimron, and Guttmann 

(1975) speculate that this lag (between 
paired-associate and prose learning) in 
children’s ability to employ an imagery 
strategy may be due to the additional re- 
quirement of keeping track of the theme of 
the story, including intersentence relation- 
ships. In this sense, then, differences asso- 
ciated with the two tasks can be attributed 
to “complexity” differences, in accordance 
with Pascual-Leone’s (1970) neo-Piagetian 
model. Having to process sentence-by- 
sentence information while simultaneously 
keeping track of intersentence information, 
story structure, and theme would certainly 
be expected to place heavier demands on 
subjects’ limited working memories, in 
comparison to what is required in learning 
a set of arbitrary associates or unrelated 
propositions. In the present context, since 
EMR children may be presumed to be less 
effective learners in comparison to normals, 
it is not appropriate to generalize across 
tasks and subject populations when deciding 
which props and Strategies will facilitate 
EMRSs' prose learning. 

The Present investigation focused on ex- 
perimenter-provided pictures and subject- 
generated imagery as candidates that, re- 
spectively, might and might not be success- 

O control groups were also employed, 
one to compare directly with the picture 
condition and one with the imagery condi- 
tion. In the former repetition control con- 
dition, subjects heard each Sentence of the 
Story twice. Such experimenter-provided 
repetition controls for the Possibility that 
pictures do little more than provide a second 
exposure to the story and has been demon- 
strated to be helpful with normal children 
(Levin, Bender, & Lesgold, 1976; Ruch & 
Levin, 1977). The latter control group was 
allowed to hear the story only once and was 
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given no special props or instructions; the 
initiation of any learning strategy, such as 
silent rehearsal, was left up to the subject, 

In order to judge whether subjects were 
“comprehending” as well as “rotely re. 
membering” information in the story, two 
types of questions were used for testing, 
Verbatim questions contained words taken 
verbatim out of the original passage. Ac. 
cording to Anderson (1972), such questions 
can be answered by matching their surface 
elements with those of the original commu. 
nication even in the absence of complet, 
comprehension of the passage’s content, 
Paraphrase questions, on the other hand, 
contained statements whose meanings were 
equivalent to the original statements but 
which were composed of synonyms of the 
substantive words previously used in the: 
story. Since memory for the sound of the 
exact words used in the passage is not help- 
ful, it is assumed that these questions can be 
answered only if the passage was under- 
stood. 

Ruch and Levin (1977) and Peng and 
Levin (Note 2) found that experimenter- 
provided pictures facilitated normal second: 
and third-grade children’s recall of both 
verbatim and paraphrase cued story infor- 
mation. Interestingly, however, Ruch and 
Levin also found that experimenter-pro- 
vided repetitions facilitated verbatim, 
though not paraphrase, cued information. 
These data are consistent with the inter- 
pretation that pictures lead to a more com- 
plete processing of story material in com- 
Parison to simple repetition, at least with 
normal children. In the present study, this 
interpretation was examined with respect to 
EMR children. 


Method 
Subjects 


Ninety-six subjects (age range of 10 years 0 months 
to 16 years 11 months) were taken from public school 
Special education classes in Madison, Wisconsin: 
Subjects Were divided into a group of 48 older (mean 
chronological age = 15.0 years; mean IQ = 69.4) and #8 
younger (mean chronological age = 12.0 years; mean 
772.3) children. All subjects had been classified 3 
educable mentally retarded by the school system an 
evinced no clinical abnormalities (neurological damag® 
Severe sensory defects, or physical stigmata). p 
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‘Design and Materials 


A 20-sentence, fictitious story was adapted from a 


| longer story used by Pressley (1976). Each sentence 


was constructed so that it contained an item of infor- 
mation of a unique nature (e.g., a dog is holding a banjo) 


| that could be requested with a single question. A col- 


ored, cartoonlike drawing was constructed for each of 
the 20 sentences depicting the events in the story. 

A question was formulated to measure recall of in- 
ormation contained in each of the 20 sentences of the 
story. Each question was produced in two forms, ver- 
batim and paraphrase. For example, for the original 
sentence, “A dog holding a banjo came running up to the 
gate,” the verbatim question was “What was the dog 
holding as he ran up to the gate?” and the paraphrase 
question was “What was the hound carrying as he ar- 
rived at the entrance?” 

In order to minimize the cumulative effects of the 
story sequence and avoid the possibility of penalizing 
subjects who could not answer a preceding question, 
most questions provided information sought in previous 
questions (Levin, 1973). For example, the answer to 
the question, “How did the dog get past the gate?” was 
that he pushed Joseph (the gatekeeper) out of the way 
and crawled under it. The next question incorporated 
part of this answer by asking, “What did Joseph do after 
the dog pushed him out of the way and got past the 
gate?” All questions were read to a group of normal 
pilot subjects who had not heard the story. The failure 
of subjects in this group to guess correct answers verified 
that information contained in the questions was not 
available via prior knowledge or associations (see 
Guttmann et al., 1977). 

Each subject was randomly assigned to one of four 
experimental conditions: (a) the picture condition, 
where subjects viewed a picture while hearing each 
Sentence of the story; (b) the imagery condition, where 
subjects were instructed to generate a mental picture 
for each sentence of the story; (c) the repetition control 
condition, where subjects listened to each sentence twice 
in succession; and (d) the control condition, where 
Subjects listened to the entire story without pictures or 
Special instructions. 

A three-sentence practice story, along with instruc- 
tions and props appropriate for the subject’s condition, 
Was given and was followed by sample questions. The 
practice story and questions were recorded on tape as 
were the actual story and questions. For the actual 
Story, one of two question orders was employed, with 
each order containing 10 verbatim and 10 paraphrase 
questions. Question types were randomly interspersed 
across the 20 questions and reversed between orders. 


Procedure 


Subjects were tested individually in a room with only 
the experimenter present. All subjects were told ini- 
tially that they were going to hear a story and that af- 
terward they would be asked to answer some questions 
about the story. Following thorough instructions and 
the practice story, subjects listened to the actual story. 
Subjects in the imagery condition were reminded to 
Imagine at three preselected spots during the story, so 
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Table 1 
Mean Percentage Correct in Each of the Four 


Conditions by Question Variation 


Condition 
Repe- 
Question tition 
variation Picture Imagery control Control 
Verbatim 67.50 36.67 43.13 37.50 
Paraphrase 59.79 33.33 40.00 30.00 
Across 
variations 63.64 35.00 41.56 33.75 


that performance failure could not be attributed to a 
simple “strategy discontinuance” explanation. Im- 
mediately after the story was completed, the questions 
were played, and the subject’s responses were recorded 
on an answer sheet by the experimenter. When testing 
was completed, each subject was asked not to tell his or 
her classmates about the story or testing activities. 


Results 


Each question was assigned a value of 1 
point for scoring. Thus, the maximum score 
was 20 points, or 10 verbatim and 10 para- 
phrase points. Half-point credit was given 
for responses in which some but not all of the 
information was correct (e.g., in a question 
for which the answer was a basketball, some 
subjects gave baseball as an answer), or when 
it was incompletely specified (e.g., for a 
previous example, the dog pushed the gate- 
keeper out of the way and crawled under the 
gate). Decisions determining the criteria for 
the assigment of zero-, half-, or whole-point 
scoring were based upon the judgments of 
two independent raters, who were not in- 
formed of the experimental conditions in 
which the answers appeared. For the most 
part, these decisions were straightforward, 
and any discrepancies were resolved through 
discussion. Internal consistency estimates 
of the total test’s reliability (Winer, 1962) 
ranged from .72 to .92 for subjects in the four 
experimental conditions. 

Mean performance in each of the four 
conditions (expressed as percentage correct) 
is presented in Table 1 separately for ver- 
batim and paraphrase questions. For each 
question variation, three families of com- 
parisons were evaluated. First, the condi- 
tions effect was assessed on the basis of the 
six possible pairwise comparisons involving 
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the four conditions. Second, these same six 
comparisons were examined in terms of their 
interaction with age (older vs. younger 
EMRs). Finally, the age main effect itself 
was evaluated. Thus, in all, 13 planned 
comparisons were conducted for each ques- 
tion variation (with each based on 88 df). 
The Type I error rate (o) was set equal to .01 
per comparison, yielding an experimentwise 
a of .13 or less per question variation (Kirk, 
1968). This approach was adopted to ad- 
dress the present research questions more 
directly, while controlling the experiment- 
wise a at about the same level as would have 
been obtained using traditional methods of. 
analysis for a two-factor design. In that 
case, assessing the main effects of condition 
and age as well as their interaction, each at 
the .05 level, would produce an experiment- 
wise « of .15 or less. 

Since none of the comparisons involving 
age was significant, the data are discussed 
only in terms of the across-age conditions 
differences, as presented in Table 1. Con- 
sideration of the age variable will be given in 
the Discussion section. 

; As may be inferred from Table 1, subjects 
in the picture condition performed better 
than subjects in each of the other three 
conditions, The differences were significant 
for both verbatim questions (all ts > 3.52; ps 
< .01) and paraphrase questions (all ts > 
2.98; ps < .01). Moreover, no differences 
among the three other conditions were sig- 
nificant for either question variation (all It]s 
< 1.51; ps > .10). Thus, the results are quite 
straightforward, with statistically compa- 
rable performance profiles produced by the 
two question variations. 


Discussion 


The effects of showin, ict; 
children while they letan rie 
striking, amounting to 89% facilitation rel- 
ative to the no-picture control group’s ay- 
erage. Moreover, since the effect was at 
least as striking on paraphrase questions as 


to subjects’ memory for surfac 

bject: r e-level pho- 
nological information. The est eo 
Pictorial adjunct to a Population of ineffi- 


BRUCE G. BENDER AND JOEL R. LEVIN 


cient learners increases the amount of i 
formation they are able toremember. Thy 
the use of visual illustrations by special ed 
ucators, in books as well as classroom aid 
warrants continued investigation.! 

Of equal interest as the picture effects, bt 
for other reasons, is the complete lack of fa 
cilitation due to either experimenter-pro 
vided repetitions or visual imagery instruc 
tions. Concerning the former, in contrast t 
previous research with normal childra 
(Levin et aL, 1976; Ruch & Levin, 1977) 
simply repeating each sentence of the stor 
did not boost the present EMR children! 
performance. Thus, whatever mechanisn 
is activated by repetition in normals (eg, 
attention or rehearsal) was not activated 
here; or at least, it did not materialize in 
learning gains. This lack of facilitation due 
to repetition among both younger and older 
EMRs should be of some importance to re: 
searchers and practitioners in the special 
education field, since there appears to be 
Scanty systematic data addressing this 
question in EMR populations (see Brown, 
1974). 

The failure of imagery instructions to 
improve recall is similarly interesting in thal 
it ties in with the finding that younger (de 
velopmentally less advanced) normals do not 
benefit from imagery instructions on the 
same task (e.g., Dunham & Levin, in press 
Guttmann et al., 1977; Shimron, 1974). In 
this regard, it is difficult to attribute the 
failure to any unique characteristics of the 
present passage. First, although the pat- 
ticular story used here differed from those it 
previous experiments, it nonetheless con 
formed to Levin and Lesgold’s (in press) 
specifications for enhancing the likelihoo 
of obtaining picture and imagery effects i 
Prose. Second, and more to the point, SUP: 
plementary data collected on a sample ? 
normal third graders did produce an imag’ 
ery-control difference with the same pas 
sage, 

. Two other possible explanations of the 
Imagery strategy's failure among EMRs ma) 
o be dismissed. First, it cannot be argu 


t 


* These conclusions are also in accord with the m 
of a study by Riding and Shore (1974), discovered n 
the present work had been completed. 
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Y that subjects forgot to keep using the strat- 
| egy, since it will be remembered that three 
strategy prompts were provided by the ex- 
perimenter throughout the passage. Nei- 
ther can it be argued that they simply lost 
interest in applying the strategy, since an 
analysis of performance on early and late 
portions of the story did not reveal any dif- 
‘ferential effects. 
| Thus, consistent with previous specula- 
tions about imagery on this task (Lesgold, 
Levin, Shimron, & Guttmann, 1975), it is 
assumed that the complex process of gener- 
ating and regenerating images while keeping 
track of the theme and events of a story is too 
great a requirement for EMR children. This 
is in contrast to the simpler operation of 
, generating discrete images for arbitrary 
pairs, something with which EMRs are suc- 
cessful (Lebrato & Ellis, 1974; Yarmey & 
Bowen, 1972). Thus, it is not enough to ask 
whether or not retarded children can gen- 
erate mental images or employ any other 
mediational strategy. The degree to which 
the retardate experiences a deficiency in his 
or her ability to employ such strategies varies 
with task characteristics, and “the more that 
the nature of the task permits strategic be- 
havior in learning, or requires complex 
learning, the greater the likelihood that 
brighter ... nonretarded groups will out- 
perform retarded groups by a wide margin" 
(Spitz, 1976, p. 49). It remains to be seen 
whether extended practice (e.g., Pressley, 
1976) or training (e.g., Lesgold, McCormick 
& Golinkoff, 1975) can be employed to help 
EMR subjects learn to generate images while 
listening to astory. That they can do so on 
a paired-associate task (Lebrato & Ellis, 
1974; Yarmey & Bowen, 1972) and on a prose 
learning task when the experimenter pro- 
vides a picture for them to encode (Riding & 
Shore, 1974; and the present results) strongly. 
suggests that the foundation skills for ef- 
fective imagery training are present.? 
Finally, the finding that the average per- 
formance of older EMRs (14 to 16 year olds) 
did not surpass that of younger EMRs (10 to 
14 year olds), while potentially intriguing, 
must be interpreted with caution? Any 
number of plausible hypotheses can be of- 
fered to account for this finding, although 
the available literature provides little con- 
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firmatory evidence. Although it is possible 
that cognitive development between these 
two ages has been arrested for EMRs in 
contrast to normals, a couple of artifactual 
variables must first be considered. For one, 
it is possible that the criteria used to place 
children from this particular school district 
into EMR categories varied over time, and 
if present criteria are less stringent, the 
younger children may be on the whole more 
able. Somewhat related to this is the likely 
possibility that with the current emphasis on 
mainstreaming, especiall in the upper 
grades, the older children still categorized as 
EMR are comparatively worse off than the 
younger children. Such extraneous factors 
must be examined before the notion of a 
developmental ceiling is accepted. 


2Tn all of the successful imagery attempts using 
EMRs, concrete stimulus support (in the form of pic- 
tures or objects) has been required. Indeed, Lebrato 
and Ellis (1974) could not facilitate EMRs’ performance 
on a purely verbal paired-associate task—a result con- 
sistent with other imagery development findings (see 
Levin, 1976). 

3An examination of the relationship between 
subjects’ actual ages and recall in the control condition 
(replacing the simpler older-younger distinction) leads 
to the same conclusion, since r = .14 across question 
variations. 
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Mathematics Anxiety Rating Scale: Predicting Anxiety 
Experiences and Academic Performance in 
Two Groups of Students 


Larry W. Morris, Dale S. Kellaway, and Donna H. Smith 
Middle Tennessee State University 


The Mathematics Anxiety Rating Scale (MARS) was administered to 52 psy- 
chology students and 54 mathematics students. Preceding course examina- 
tions in statistics and mathematics courses, worry (cognitive concern about 
test performance), and emotionality (physiological and affective arousal) were 
assessed. MARS scores were found to be higher for psychology students than 
for mathematics students, to be useful predictors of both worry and emotion- 
ality, and to be inversely related to performance for psychology students. A 
strong inverse relationship was found between worry and performance for 
both groups and between emotionality and performance for psychology 


students. 


Despite the repeated exposure of psy- 
chology instructors to numerous capable 
students who “block” and thus fail to per- 
form well or at least suffer noticeably in 
statistics courses, there has been very little 
research on the nature, cause, or effects of 
mathematics anxiety. There have been a 
few recent studies on the effects of general 
or test anxiety on math performance, but the 
results have been inconclusive. Neither 
Hambleton and Traub (1974) nor Towle and 
Merrill (1975) were able to find a relation- 
ship between Achievement Anxiety Test 
Scores and math performance, and Szetela 
(1973) found the effects of test anxiety on 
math performance to be marginal. 
McCormick (1975) found both general anx- 
iety and test anxiety to be less closely related 
to math aptitude than to reading aptitude, 
whereas self-esteem was more closely related 
to math aptitude. Similarly, one study of 
trait and state anxiety found only state 
anxiety, in combination with low ability, to 
be debilitative (Rappaport, 1975); while 
another study found only trait anxiety, in- 
teracting with instructional technique, to be 


Data were collected by the second and third authors 
in partial fulfillment of the Master of Arts degree re- 
quirements at Middle Tennessee State University. 

Requests for reprints should be sent to Larry W. 
Morris, Department of Psychology, Middle Tennessee 
State University, Murfreesboro, Tennessee 37132. 


debilitative (Papay, Costello, Hedl, & 
Spielberger, 1975). 

As early as 1957, Dreger and Aiken 
pointed out that number anxiety, a “syn- 
drome of emotional reactions to arithmetic 
and mathematics” (Dreger & Aiken, 1957, p. 
344), assessed by a 3-item scale, constituted 
a separate factor from general anxiety and 
was correlated negatively with final grades 
in math classes. Szetela (1973) found dif- 
ferences between test anxiety groups and 
between males and females on a 10-item 
Mathematics Debilitating Anxiety Scale, but 
no correlation between math anxiety and 
math performance was reported. 

Richardson and Suinn (1972), noting that 
many individuals who do not ordinarily 
suffer from other tensions do suffer from 
math anxiety and that over one third of the 
students who had responded to their be- 
havior therapy program offered through the 
university counseling program indicated that 
their problem was based on mathematics 
anxiety, reported the development of a 
measure of mathematics anxiety. The 
Mathematics Anxiety Rating Scale (MARS; 
Richardson & Suinn, 1972) assesses anxiety 
associated specifically with the use of num- 
bers and mathematical concepts. Norma- 
tive, reliability, and validity data were 
gathered initially by Richardson and Suinn 
and successfully replicated by Suinn, Edie, 
Nicoletti, and Spinelli (1972). MARS scores 
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have been shown to decrease among clients 
volunteering for the treatment of math 
anxiety as a function of systematic densiti- 
zation, accelerated desensitization, and 
anxiety management training (Richardson 
& Suinn, 1973; Suinn & Richardson, 1971). 
One of the advantages of the MARS, which 
presents numerous situations that individ- 
uals may face in their daily activities, is that 
it is useful in the construction of anxiety 
hierarchies for treatment purposes, as 
demonstrated by Hyman (1973). 
The purpose of the present study is to 
further the exploration of mathematics 
anxiety, utilizing the MARS, by an analysis 
of MARS score correlates in actual academic 
situations. The rationale of the worry- 
emotionality distinction in test anxiety 
theory (Liebert & Morris, 1967) was used as 
a source of hypotheses for the investigation. 
This position asserts that anxiety experi- 
enced in the testing situation consists of at 
least two separable components: worry and 
emotionality. Worry is defined as cognitive 
concern about performance and its conse- 
quences and has been shown to vary as a 
function of performance expectancy (Spie- 
gler, Morris, & Liebert, 1968), test impor- 
tance, perceived difficulty of the test, and 
feedback conditions (Morris & Fulmer, 
1976). Emotionality, defined as physio- 
logical and affective arousal, is much less 
consistently affected by such cognitive con- 
siderations and varies rather as a function of 
conditions such as threat of electric shock 
(Morris & Liebert, 1973), These compo- 
nents can be identified in the reactions of 
preschoolers (Morris, Brown, & Halbert, 
1977) and school children (Hagtvet, 1976) to 
evaluative situations, Further, the differ- 
onpa effect of worry and emotionality on 
performance is well documented Deffen- 
bacher, 1977; Doctor & Altman, 1969, Mort 
& Liebert, 1969, 1970; Morris, Smith, An- 
drews, & Morris, 1975). As Wine (1971) and 
Sarason (e.g., 1975) have pointed out, the 
high-test-anxious student ruminates over his 
or her inadequacy and is distracted by self- 
deprecatory thoughts, that is, worries, and 
thus performs poorly, Emotionality has 
little effect on performance unless it is high 
enough to be distracting. Recently, appli- 
cations of the worry-emotionality con- 
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ceptualization in anxiety theory have bee 
made in the area of assessment (see Spiel 
berger, Gonzalez, Taylor & Anton's, Note 1 
newly developed Test Anxiety Inventory 
and treatment (Finger & Galassi, 1977) o 
test anxiety. 

In addition to the above considerations 
the present study includes comparisons be. 
tween students voluntarily taking matt 
classes (math majors) and students required 
to do so (psychology majors in statistic 
classes). In a preliminary study, 30 under. 
graduate psychology majors or minors er 
rolled in an introductory statistics class anl 
31 undergraduate mathematics majors ot 
minors enrolled in a sophomore-junior level 
mathematics course were administered the 
MARS during regular class periods. Im- 
mediately preceding the final examination 
in each course, subjects were asked to com- 
plete a preexamination worry—emotionality 
questionnaire. Psychology students had 
significantly higher MARS scores but lower 
worry and emotionality scores than math 
students. Correlations among MARS, 
worry, and emotionality scores were positive 
for both groups, while inverse relationships 
of math anxiety and worry with test perfor- 
mance were found for the math students 
only. As expected, there was no relationship 
between test performance and emotionality 
for either group of students. 

On the basis of the theoretical consider- 
ations advanced above and the results of this 
preliminary study, it was hypothesized that 
(a) math anxiety would be significantly 
higher among psychology students than 
math students; (b) there would be a signifi- 
cant positive relationship between mat 
anxiety and both worry and emotionality 
aroused in a test situation; and (c) there 
would be a significant inverse relationship 
between math anxiety and test performant ' 
and between worry and test performancé, 
especially for psychology students. 


Method 


The subjects were 54 mathematics students (16 fe 
males and 38 males) enrolled in three sophomore-juni 
level courses and 52 psychology students (25 females 
and 27 males) enrolled in two introductory statistic 
courses. The multiple classes with different instructor | 
Within each of the two groups were treated as inta 1 
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table 1 


- Mean Scores for All Variables 
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Y Math students Psychology students 
(n = 54) (n = 52) 
| Variable M SD M SD t ratio 
| MARS totale 173.72 47.68 195.58 52.95 2.24* 
| Math Class Anxiety subscale 14.46 4.85 17.02 6.30 2.35* 
| Math Test Anxiety subscale 28.33 9.85 31.38 9.03 1.68 
|. Math Studying Anxiety subscale 20.13 6.90 22.56 7.37 1.79 
Perceived course difficulty 2.85 1.01 3.15 1.17 1.65 
Grade aspiration 4.04 1.12 4.50 .82 2.61* 
Emotionality 13.72 4.95 12.85 5.22 .86 
Worry 13.11 3.66 13.63 4.32 .69 
^ First exam grade 84.30 11.55 84.27 15.49 = 
Final exam grade 79.72 13.34 75.08 17.45 — 
Course grade 81.85 11.52 79.50 14.02 — 
a MARS = Mathematics Anxiety Rating Scale. 
* p € 05. 
" 


classes, without further attempts to match subjects, in 
the interest of utilizing the actual academic situation 
in which anxiety variables were operative. Subjects 
were told that the purpose of the experiment was to 
study the relationship between mathematics anxiety 
and test performance and were asked for their permis- 

Sion to use data for experimental purposes. 

In order to minimize the effect of students" experience 
in the particular course utilized, the MARS was given 
within the first 2 weeks of the semester during a regular 
class period, and the worry-emotionality questionnaire 
was administered immediately prior to the first course 
examination. The MARS presents 94 potentially 
anxiety-arousing situations to which the subject rates 
his or her “usual” response on a 5-pointscale: "notat 
all,” “a little," “a fair amount,” “much,” or “very 
much.” 

In addition to a total score, three subscales of the 

RS were identified and scored separately for the 
purpose of identifying more specifically the math anx- 
lety differences between groups and their effects on 
performance. Eight of the MARS items, such as 
"walking into a math class," were used to assess Math 
Class Anxiety. The Math Studying Anxiety subscale 
consisted of nine items such as “studying for a math 
test.” Items such as “taking an examination (quiz) in 
à math course" comprised the 10-item Math Test 
Anxiety subscale. 

. The preexamination worry-emotionality question- 
naire administered immediately preceding the first 
course examination consisted of five worry items and 
five emotionality items. This scale was devised by 
Liebert and Morris (1967) to assess the worry and 
emotionality components of anxiety as aroused by the 
test situation. This scale is a measure of immediately 
experienced test anxiety and not specifically math 
anxiety. Responses to each item were made on a 5- 
Point scale, with the five possible responses ranging 
from "the statement does not describe my feeling" to 
"the feeling is very strong." 

"Three performance measures were utilized (first 
exam, final exam, and course grade; all were converted 


to a 100-point scale). Furthermore, two questions 
concerning attitudes toward the course were adminis- 
tered following completion of the MARS. The first 
assessed the perceived difficulty of the course, and the 
second assessed the importance of making a good grade 
in the course. It is possible that differences in such 
grade expectancy and aspiration may be important 
moderator variables. 


Results 


Initially, mean scores and correlations 
were computed separately for males and fe- 
males within each group. However, there 
were no sex differences in MARS, worry, or 
emotionality scores, and no significant dif- 
ferences in the magnitude of the correlations 
for males and females. As predicted, psy- 
chology students had higher MARS total 
scores (and Math Class Anxiety scores) than 
math students (see Table 1). However, 
group differences in total MARS scores were 
not predictive of group differences in worry 
and emotionality scores at exam time, that 
is, there were no group differences in worry 
and emotionality scores. 

Correlations between the subjects’ MARS 
scores and both the emotionality and worry 
components of anxiety as aroused by the first 
exam situation are presented in Table 2. 
With the exception of Math Class Anxiety 
scores for math students, there were signif- 
icant positive relationships among these 
variables for both mathematics and psy- 
chology students, as expected. Correlations 
were somewhat (nonsignificantly) higher for 
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Table 2 
Product-Moment Correlations for Mat: 
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h Students (Above the Diagonal) and Psychology 


Students (Below the Diagonal) 


Variable 1 2 B 4 5 6 fi 8 9 10 11 
1. MARS totala 87 86 93 33 30 45 18 -21 —.21 -39 
E ee 91 m 81 47 2 40 05 —14 -17 -I 
x ere 82 72 80 42 34 48 3 —326 -.33 -32 
I Spa 91 88 82 m 3 M 42-23 -2 -M 
5. Emotionality 56 51 58 .50 B1 40 15 -.06 —20 -18 
6. Worry ae da ae aie TL 4 38 —94 —36 -.3 
7. Perceived course 

difficulty 62 58 67 57 39 37 40 -39 -48 -al 
8. Grade aspiration E ME), EPES.) BRAR CSA 15-011 Oz EN S6; —.06 ~.12 —.09 
9. First exam grade Si a 07 18.4 a 27 73 3) 
10. Final exam grade =37 —34 —16 -26 —40 -—39 —41 —02 45 94 
11. Course grade 230 27 -ie 24-48 9 41-32. .19  .83  .87 


Note. For the math students, n = 54; for the psychology students, n = 52; for both groups, r (.05) = .27. 


* MARS = Mathematics Anxiety Rating Scale. 


psychology students, especially where the 
emotionality variable was concerned. Math 
and psychology students perceived their 
courses to be equally difficult, but psychol- 
ogy students rated the importance of making 
à good grade in the course significantly 
higher. However, only scores on the “per- 
ceived difficulty of the course” item were 
consistently related to anxiety and perfor- 
mance variables. 

Total MARS scores were significantly 
inversely related to performance only for 
psychology students. Only the Math Test 
Anxiety subscale was correlated with per- 
formance for the math students. These 
group differences in anxiety-performance 
relationships were in the predicted direction. 
It should be noted that all of the significant 
MARS-performance correlations involved 
final exam and course grades rather than 
first exam performance. 

Significant inverse relationships between 
worry and all performance measures were 
noted for both mathematics students and 
psychology students, as predicted. But in- 
terestingly, the psychology students’ data 

revealed a significant inverse relationship 
between emotionality and performance 
also—a finding that is contrary to prediction 
and to the findings of the preliminary study 
reported earlier. While the worry-perfor- 
mance relationship is more robust and con- 


sistent, the conditions under which emo 
tionality relates to performance have not yet 
been clarified. Doctor and Altman (1969) 
found that emotionality relates to perfor- 
mance only when worry scores are low, 
whereas Deffenbacher (1977) found the re 
lationship to hold only when worry scores att 
high. However, the math and psychology 
groups did not differ in worry scores in this 
study. The crucial variable here may be the 
unusually high correlation between worry 
and emotionality in the psychology group 
(.71) as compared to the math group (.31). 


Discussion 


The significant positive relationships 
among MARS, worry, and emotionality 
scores found in this study for both psycho! 
ogy and mathematics majors suggest that t i 
MARS can be used as a valid predictor 0 
test anxiety reactions in math classes. 
correlations presented here are higher tha! 
those typically found in test anxiety studie? 
The Math Test Anxiety subscale was E 
somewhat better predictor of worry 2? 
emotionality scores in the actual test sit 
tion and might well be used alone whe?! 
brief predictive measure is desired. 9 
Subscale was also the best single MAR 
predictor of performance for math student 
in both studies. Intercorrelations amo" 
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MARS subscales and total scores indicate a 
réuarkably high degree of internal consis- 
tency considering the makeup of the 94-item 
total score, that is, largely consisting of items 
-| referring to nonacademic settings, and that 
of the 8- to 11-item subscales, that is, refer- 
ring to specific aspects of academic experi- 
ences. 
| . Results indicate that worry is the anxiety 
* variable with the most powerful effect on 
academic performance. In particular, the 
markedly different patterns of relationships 
, between worry and emotionality and per- 
formance variables were apparent for math 
students. Thus, the same pattern of 
worry-performance correlations is shown to 
hold in the case of mathematics as has been 
previously shown in studies of general apti- 
r tude and intelligence test performance 

(Deffenbacher, 1977; Morris & Liebert, 

1969), typing performance (Morris et al., 
1975), and performance in psychology 

courses (e.g., Morris & Liebert, 1970). The 
negative worry-performance correlations 
may be interpreted in two ways, either of 
which is consistent with the worry-emo- 
tionality distinction. Worrying during tests 
may interfere with performance by dis- 
tracting attention from the task, as suggested. 
by Wine (1971) and Sarason (1975). Alter- 
natively, the inverse correlations may be 
accounted for by assuming that worry scores 
reflect concern about accurately perceived 
past and present performance difficulties. 
Obviously, the two explanations are not 
mutually exclusive, and both factors may 
operate simultaneously. Neither process 
involves emotionality directly. To the ex- 
tent that mathematics anxiety (MARS) is 
either expressed in the testing situation as 
debilitative worry or is itself a reflection of 
Correct negative performance expectancies, 
MARS scores should also correlate nega- 
tively with performance. 

Following the distinction between trait 
anxiety and state anxiety (Spielberger, 
1966), we assume that the MARS is a situa- 
tion-specific trait anxiety measure, indicat- 
Ing proneness to experience state anxiety 
(worry and emotionality in this case) in 
evaluative situations involving mathematics. 
It should be noted that such an assumption 
nS not unequivocal, since the MARS in- 
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structions are somewhat ambiguous (“How 
do you feel ‘nowadays?’ ”), and the items do 
refer to very specific situations. While the 
data presented here cannot resolve this 
question, it seems reasonable to assume that 
consistently significant correlations between 
MARS and worry-emotionality such as 
those reported here support a trait inter- 
pretation. There is little reason to believe 
that two vastly different state anxiety mea- 
sures, one given under nonstressful and the 
other under stressful conditions with the 
administration of the two separated by a 
span of time, would correlate highly. Fur- 
thermore, MARS scores appear to function 
in a traitlike manner in their relationship to 
performance in that they predict final exam 
and course grades (for psychology students) 
rather than first exam grades, indicating 
perhaps a cumulative effect on performance 
throughout the semester. In contrast, worry 
affected performance in the initial situation 
in which it was aroused, thereby affecting 
course grades. 

Thus, the effect of mathematics anxiety 
per se on performance is probably indirect 
and unlikely to be found consistently in 
specific situations. Nevertheless, the 
MARS probably represents an improvement 
over more general test anxiety scales in 
predicting performance difficulties (cf. 
Hambleton & Traub, 1974; Szetela, 1973; 
Towle & Merrill, 1975), partially because it 
predicts with greater accuracy who will be- 
come worried about their performance in 
situations involving the use of mathemat- 


ics. 
Reference Note 


1. Spielberger, C. D., Gonzalez, H. P., Taylor, C. J., & 
Anton, W. D. Preliminary test manual for the Test 
Anxiety Inventory. Manuscript in preparation, 
University of South Florida, 1977. 
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Father Absence, Educational Preparedness, and 
Academic Achievement: A Test of the Confluence Model 
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Predictions of academic deficits due to early and continuing parental absence, 


as derived from Zajonc’s and Markus 


’s confluence model, were investigated. 


To test these predictions, equal numbers of father-present and father-absent 


lower-class black kindergarteners (60 


of each sex) were assessed on 12 educa- 


tional preparedness measures. Two years later, they were tested for reading, 
mathematics, and language arts achievement. A Father Absence X Sex analy- 
sis of covariance (with social class controlled) of preparedness factor scores re- 
vealed no significant effects. Similar multivariate analysis of the achieve- 


ment criteria revealed main (favoring 


father-present subjects) and interaction 


effects on the mathematics test. Pair-wise comparisons suggested that father 
presence facilitated the mathematics performance of girls more than boys. 


The results only partially support the 


The adverse effects of early father ab- 
sence and increased family size on intelli- 
gence, academic aptitude, and educational 
achievement have been well documented by 
a number of investigators (e.g., Baughman 
& Dahlstrom, 1968; Belmont & Marolla, 
1973; Carlsmith, 1964; Lynn, 1974). In spite 
of imaginative explanations for such find- 
ings, until the recent advent of the con- 
fluence model of intelligence (Zajonc, 1976; 
Zajonc & Markus, 1975), there have been few 
attempts to interpret the effects within a 
unified theoretical framework. The con- 
fluence model not only appears to provide an 
excellent theoretical frame of reference for 
shedding light on these earlier results but 
also seems sufficiently well developed to. 
predict additional outcomes. 


"The research reported here is based on investigations 
conducted for the first author's doctoral dissertation at. 
the University of Virginia. 

Grateful acknowledgement is extended to Dallas R. 
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Charlottesville Public Schools, Virginia, for his assis- 
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organization of the field work that contributed to the 
Success of the project. 

Requests for reprints should be sent to Herbert C. 
Richards, Foundations of Education, University of 
Virginia, 140 Ruffner Hall, 405 Emmet Street, Char- 
lottesville, Virginia 22903. 


confluence model predictions. 


The basic tenet of the confluence model is 
that the course of a child's intellectual de- 
velopment is profoundly influenced by 
family configuration. This effect is mani- 
fested through the "intellectual environ- 
ment" of the home, which is conceived as the 
numerical average of the intellectual con- 
tributions made by the household members. 
Presumably, then, a child with many youn- 
ger siblings lives in a comparatively impov- 
erished intellectual environment, owing to 
the higher proportion of household members 
of lower mental age. Conceptualizing a 
family's intellectual environment in this 
manner also provides a fresh perspective on 
the academic effects of father absence, for it 
follows directly that the intellectual envi- 
ronment of the family “manifests the most 
dramatic changes when there is an addition 
to or departure from the family” (Zajonc, 
1976, p. 227). When a father is chronically 
absent, there would necessarily be a decrease 
in the quality of the intellectual environ- 
ment, since one member of high mental age 
has been removed from the family configu- 
ration. Children from households with a 
long-absent father should be educationally 
less well prepared for school and perform 
poorer on. early achievement tests. Such 
deficits, although they are believed to be 
cumulative over many years of schooling, 
should begin to be apparent in the early 
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years of formal education because children 
spend a high proportion of their time at 
home prior to the beginning of their public 
school experience. 

Although the results of many recent 

studies are remarkably consistent with pre- 
dictions derived from the confluence model, 
a variety of methodological shortcomings has 
obscured the meaning of these efforts (see 
Herzog & Sudia, 1973). Extant research has 
also been based primarily, but not exclu- 
sively, on data from white adolescents (see 
Santrock, 1972, for a representative research 
effort with preadolescents and Sciara, 1975, 
for a tentative confirmation of the model 
among black children). The major purpose 
of the present investigation is to test the 
usefulness of the confluence model in pre- 
dicting the educational preparedness and 
academic achievement of black children at 
the beginning of their public education. 
Efforts have been made to control for such 
confounding influences as length of father 
absence and social class differences. 


Method 
Subjects 


The subjects, 120 black children, were predominantly 
from lower-income homes in Charlottesville, Virginia; 
half of the 60 boys and half of the 60 girls were from 
father-absent households, Each father-absent child 
taking part in this research was carefully screened to 
insure that absence had occurred before his or her 
fourth birthday and that there had been no father in the 
home through the second grade. A detailed description 
of how the families were selected and interviewed across 
this time span can be found in Fowler (1977). Those 
children whose mothers had remarried prior to the 
child 's second-grade entrance, who lived in extended 
families, or whose adult relatives had assumed primary 
caretaking responsibilities for them were not included 
in the sample. It should be noted that father-present 
and father-absent subjects did not differ significantly 


in number of siblings (means for th 
and 2.20, respectively). pee ee 


Measures and Procedures 


Social class and family size. i 
of social position (Hollingshead, Note or ader 
assessing the social class standing of each partici te 
family. In contrast to the original Hollingshead pm : 
procedure, scaling was done in such a manner tha d 
higher scores indicated higher social class aig 
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Since children living in extended families were not in 
cluded in the study (the presence of such children wouk 
confound the predicted effects of the confluence model) 
the number of siblings in the home served as the quan 
titative index of family size. The demographic dat, 
were obtained from school files initially, but the famil 
status of each child including father presence or father 
absence, the age of the child when the father left th 
household, the social class standing of the family, and 
the number of siblings present in the home was als 
verified by a visiting teacher or social worker. 

Educational preparedness. Upon entrance int 
kindergarten, each subject was assessed on three mea: 
surement batteries, yielding 12 distinct scores (all pre- 
paredness and achievement measures were those rou. 
tinely administered by the school system). 

First, the Early Detection Inventory (McGahant 
McGahan, 1967, 1973) was administered individually 
by resource teachers. Five separate scores were ob 
tained: general school readiness, human figure 
drawing, geometric designs, gross motor skills, andan 
additional index of fine motor skills. 

Second, classroom teachers made behavioral rating, 
for each child on the Social Psychological Adjustment 
Inventory (Fowler & Crowe, Note 2). This instrumen! 
consists of 17 Likert-type scales (each a 5-point con 
tinuum, with a higher score indicating better adjust 
ment) that assess the peer relationships of the child, hi 
or her task orientation, attention span, and interper 
sonal relationships with adults. Because of the com 
sistently high intercorrelations among these scale 
found in previous research, a single composite indexol 
teacher-perceived behavioral adjustment was dé 
rived. 

Finally, the Metropolitan Readiness Tests (Hildreth 
Griffiths, & McGowran, 1969) were administered t? 
groups of children under the supervision of the clas 
room teachers. This test consists of six components, 
each scored separately: word meaning, listening 
matching, alphabet, numbers, and copying. 

Academic achievement. During the second grade 
all the subjects were administered the Science Researt 
Associates' (1971) Assessment Survey (SRA) by class 
room teachers according to standard instructions. 
data were then sent to the publisher for scoring and 
preliminary analysis. Three achievement scores Wet 
subsequently obtained for each child: mathematics 
reading, and language arts. 


Preliminary Analysis 

^ 
, The number of educational preparedness indices a 
in all) were reduced through factor analysis to a m0" 
manageable number of dimensions. Before scores? 
such dimensions were obtained, however, it was fi 
necessary to ascertain the degree of structural similati 
of the measures made on the respective subgroups 
youngsters (i.e., the extent to which factors identifi? 
for one group would correspond to those obtained” 
the other). Only if there was a reasonable dest’ 2 
Structural similarity across father-present and fathe 
absent groups would it be appropriate to compare " 
for mean differences on the derived factor scores: 1» 
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Product-Moment Correlations of Social Class Standing and Family Size with Factor Scores and 
the Science Research Associates’ (SRA) Assessment Survey Scales 


| Group 
Father present, Father absent Combined 
Social Family Social Family Social Family 
Variable class size class size class size 
| Factor score .67* —.05 -46* —.08 .62* —.04 
**SRA scale 
| Mathematics .50* 7:19 .34* .01 .52* -.07 
Language Arts .63* =A: .37* —.02 .63* —.07 
Reading -70* 2l .99* .05. .64* —.08 


Note. Each correlation is based on 58 degrees of freedom for the separate groups and 118 degrees of freedom for the combined 


group. 
* p € O1. 


parallel, but. independent, factor analyses were con- 
- ducted on the two groups. The results indicated that 
the factor patterns of the father-absent and father- 
present groups were highly similar with respect to the 
educational preparedness measures. The groups were 
therefore combined for the purpose of estimating factor. 
Scores. "l'hese scores were used as criteria in subsequent 
cant Details of this analysis can be found in Fowler 

1977). 


Results and Discussion 


Correlational Analysis 


To provide an overview of the data, a 
correlation matrix was generated that in- 
cluded the various demographic, prepared- 
ness, and achievement measures. The cor- 
relations in this matrix guided the selection 
of covariates in subsequent analyses. As can 
be seen in Table 1, higher social class 
standing is associated with better perfor- 
mance on the criteria, an effect that is com- 
pletely consistent with previous research. It 
follows that social class standing should be 
controlled—even for subjects falling in the 
Narrow ranges permitted by the study— 
when comparing father-absent and father- 
present children. 

Contrary to expectations based on the 
confluence model, however, no high-mag- 
nitude negative relationships were found 
between family size and the criteria. It is 
Possible, of course, that these relationships 
have been tested too early; higher-magnitude 
negative correlations might well be found in 
later elementary school children. Studies 


in which higher correlations have been found 
(e.g., Solomon, Hirsch, Scheinfeld, & Jack- 
son, 1972) have typically been conducted on 
older children and adolescents. Apparently, 
the confluence model cannot be extended 
without qualification to early childhood. 
These results are also consistent with the 
findings of Grotevent, Scarr, and Weinberg 
(in press). 


Analysis of Educational Preparedness 


Next, a 2 X 2 (Father Absence X Sex) 
analysis of covariance (the effects of social 
class were controlled) was conducted on the 
educational preparedness factor scores. The 
results are shown in Table2. Nosignificant 
differences are present for the contrasts of 
father-present and father-absent subjects 
(the respective means are .26 and —.26) or 
males and females (the means are —.06 and 
.06, respectively). In addition, no significant. 
interaction effect is present in these results. 


Table 2 


Analysis of Covariance of Factor Scores 


Source df MS F 
Regression 1 34.67 — 60.75* 
Father absence (A) 1 21 .36 
Sex (B) 1 32 .56 
AXB 1 24 42 

Within 115 .57 
Note. Social class standing is the covariate. 
* p « 0. 
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Table 3 
Multivariate and Univariate Analyses of 
Covariance of the Science Research 


Associates’ Assessment Survey Criteria 
Source df MS F 


— NOWO | LI ——————— 


Multivariate analysis 
Regression 3 39,563.37 24.75** 
Father absence (A) 3 4,955.71 3.10* 
Sex (B) 3 847.22 53 
AXB 3 4,555.79 — 2.86* 
Within 113 1,598.52 
Univariate analyses: Father-absence effect. 

Mathematics 1 12,640.84 — 7.77** 
Language Arts 1 3,385.54 — 2.64 
Reading 1 1,406.20 1.30 


Univariate analyses: A X B interaction 


Mathematics 1 10,068.22 — 6.19** 
Language Arts 1 1,641.20 1.28 
Reading £ 5.90 .01 


Note. Effects of social class are controlled in all analyses; N = 
120 (30 subjects per group). 

* p <05. 
** p € 01. 


The findings do not provide any support for 
the confluence model's prediction of an av- 
erage difference in educational preparedness 
favoring father-present children. These 
results are consistent, though, with other 
reported research (e.g, Hess, Shipman, 
Brophy, & Bear, 1968), which has often 
found no differences in intelligence or aca- 
demic achievement between father-present 
and father-absent lower-class black chil- 
dren. 

Again, has this relationship between 

family status and educational achievement 
simply been tested too early? Often the 
failure to reject a null hypothesis results in 
annoying ambiguity (Rozeboom, 1960) and 
commonly does little to alter our beliefs, but 
the current findings ünderscore the i impor- 
tance of establishing more explicit boundary 
conditions for the application of the con- 
fluence model in order to assess its theoret- 
ical range. At its present stage of develop- 
ment, it is difficult to argue conclusively on 
the basis of the present investigation’s 
findings that the confluence model has been 
decisively falsified (Salmon, 1971). 
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Analysis of Academic Achievement i 


Finally, a multivariate analysis of covar- 
iance of similar design was performed on the 
three SRA criteria. When significant effects 
were indicated by the multivariate criterion 
(Wilks’s lambda), corresponding univariate 
analyses were performed on the SRA criteria 
also. The analysis of significant interaction 
effects was completed by constructing pair-’ 
wise comparisons of groups with Mahala- 
nobis’s (1936) generalized distance function, 
D?. 

The results of the Father Absence X Sa! 
multivariate analysis of covariance (and the 
corresponding univariate analyses) of the 
three SRA criteria are presented in Table 3. 
In addition, the unadjusted means of the 
four groups (father-present males, father: 
present females, father-absent males, and 
father-absent females) on the SRA 
achievement tests are set out in Table 4. 

In surveying the table of unadjusted 
means, it is interesting to note that without 
fail on the mathematics and language arts) 
criteria, the ordinal rank (from best perfor- 
mance to worst) of the four groups is (1) fa- 
ther-present females, (2) father- present! 
males, (3) father-absent males, and (4) fa- 
ther-absent females. Although this picture 


Table 4 

Means and Standard Deviations of Male and 
Female Father-Present and Father-Absent 
Children on the Science Research Associates’ 
(SRA) Assessment Survey Scales Assessment Survey Scales 


Mathe- 
Father-present males 
M 12337 12497 149.37 
SD 5271 5845 5278 
Father-absent males a 
M 101.80 — 9377 12l 
SD 37.71 32.44 34.89 
Father-present females 
M 14823 135.30 14857 
SD 4674 4240 424 
a females 5 
93.23 9313 125 
EMEN. m sr 3 


Note. "These means are unadjusted for the influence of social 


| 
——SRA scale —— scale 
e 
— Gop matis Arts Redit 
class standing; n = 30 for each group. | 
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» alters somewhat on the reading criterion, 
which is inevitably the best performance for 
each group, the father-present females con- 
sistently demonstrate the best performance 
on all three achievement tests (range = 
135.30 to 148.47). 

The multivariate analysis of covariance 
(see Table 3) revealed both a significant 
main effect for father absence and a signifi- 
cant interaction effect. Separate univariate 
analyses of each of the three criteria indicate 
that the overall effect was due primarily to 
the comparative superiority of the father- 
present subjects on the mathematics crite- 
rion. The standardized discriminant func- 
tion coefficients for each of the criteria also 
confirm the importance of mathematics in 
discriminating the group: —.78 (mathe- 
* matics), —.54 (language arts), and .38 
(reading). The significant main effect is in 
accord with the confluence model and con- 
sistent with prior research with lower-class 
black children (see Sciara, 1975); the dif- 
ferential sensitivity of the mathematics cri- 
terion (a finding often commented on by 
other writers; see Lynn, 1974; Zajonc, 1976) 
awaits explanation. 

Interpretation of the main effect favoring 
father-present children is, however, com- 
plicated by the occurrence of the significant 
interaction effect, a result unforeseen by the 
confluence model and thus representing a 
strong constraint on the adequacy of the 
model’s explanatory power (Putnam, 1973). 
Again, the significant discriminant variable, 
as revealed by the univariate analyses, is the 
mathematics criterion. The associated 
standardized discriminant function coeffi- 
cients are the following: —.81 (mathemat- 
ics), —.41 (language arts), and .44 (reading). 
The magnitudes of the coefficients suggest 
the importance of the mathematics criterion 
in distinguishing among the groups. 

To analyze the interaction effect com- 
pletely, pair-wise comparisons of the four 
groups (father-present males, father-present 
females, father-absent males, and father- 
absent females) on the mathematics crite- 
rion were made with Mahalanobis's gener- 
alized distance function, D?, after adjust- 
ment of the criterion for differences among 
the four groups due to social class standing. 
, These contrasts are shown in Table 5. 


^ 


599 


Table 5 

Partial F ratios for Pair- Wise Comparisons of 
Male and Female Father-Present and Father- 
Absent Children on the Mathematics Scale of 
the Science Research Associates' Assessment 
Survey 


Group contrast F 


Father-present females versus 


father-absent males 9.26** 
Father-present females versus 

father-absent females 7.93" 
Father-present males versus 

father-absent males 4.14* 
Father-present males versus 

father-absent females 3.62* 
Father-present females versus 

father-present males 3.38* 
Father-absent females versus 

father-absent males 1.65 


Note. The group means are adjusted for the influence of social 
class standing. The partial F ratio is Mahalanobis's generalized 
distance function, D?. Each contrast is based on 2 and 115 
degrees of freedom. 

* p « 05. 
** p «0I. 


In terms of generalized distance, it can be 
seen that father-present females (who per- 
form best) are relatively more distant than 
father-present males (p < .05) from either 
father-absent group. The father-absent 
groups are, as predicted, homogeneous (p < 
.25). Father presence among lower-class 
black children in early childhood apparently 
benefits girls more, while father absence 
penalizes both sexes equally. It must be 
emphasized that this finding cannot be de- 
rived easily from either the confluence model 
in its present form nor from general theories 
of sex role identification. Explanations 
must be sought elsewhere. 

Conceivably, the comparative difference 
between father-present females and fa- 
ther-present males at this age could reflect 
a unique underpinning to the father-present 
black family that allows female children to 
be affectively closer to both parents, conse- 
quently benefiting from increased nurtur- 
ance (from mothers) and consistently 
structured expectations for school-related 
achievement (from fathers). The psycho- 
logical cost exacted from girls in intact 
families could well be increased conformity 
to parental control, such increased compli- 
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ance being manifested in better school 
achievement than boys. How this devel- 
opment, that is, the differential affective 
distance of girls and boys to both parents in 
the father-present family, then relates to 
later influences of the peer culture and sub- 
sequent cognitive development remains an 
open issue, however. 

The integration of seemingly disparate 
research on the cognitive effects of father- 
absence and family size (as well as birth 
order and sibling intervals) is an undeniably 
attractive feature of the confluence model 
and attests to the theoretical range of the 
construct of the family intellectual envi- 
ronment. At this stage of the model’s de- 
velopment, however, the findings of the 
present investigation point to a direction in 
which the confluence model must be re- 
fined. 

It is important to recognize that father- 
absent families do not differ from father- 
present ones simply along this dimension, 
but that father absence due to divorce, sep- 
aration, or death often initiates a complex 
cycle of events involving changes of economic 
and occupational status and profound 
changes in the schema of parent-child in- 

terpersonal relationships; in sum, a wide 
range of social and emotional problems de- 
scend upon the father-absent. family. 

The family as an objective unit for social 
analysis is strikingly different from the 
family as a set of internalized relations and 
Prescriptions for interaction and develop- 


ment (Laing, 1972). Beca 


social context of the family and in turn con- 

vincingly coordinate the interpersonal realm 

of the family with its cognitive realm. 
Reference Notes 

1, Hollingshead, A. Two-factor index of social posi- 
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tion of educational preparedness. Unpublished 
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"This study investigated the effects of status level and status congruence on 
achievement and satisfaction of junior high classmates undergoing same-age 
peer tutoring, thereby conceptually replicating an already published experi- 
ment with college students. "There were 107 same-sex pairs who were formed 
so that the tutor had greater (the status-congruent condition), equal, or lesser 
initial math competence than the tutee. Two weeks were devoted to daily 
tutor training and tutoring, weekly reviews, and weekly assessments of 
achievement and attitudes. Then, partners exchanged roles for another 2- 
week round of such activity. Two multivariate contrast analyses clearly sup- 
port and cross-validate the prediction that satisfaction and perceived achieve- 


ment (and in one analysis, 
tutor than the tutee, parti 
partner, 


icularly if oi 


In an earlier study (Rosen, Powell, & 
Schubot, 1977), we proposed that the status 
implications of role assignments in the spe- 
cial kind of helping relationship known as 
same-age peer tutoring carry with them the 
potential for inviting invidious comparison 
and felt inequity. Asa consequence, the 
relative performance and morale of the tu- 
toring partners would be affected. This 
thesis was investigated with upper-division 
college undergraduates of both sexes (mainly 
female) enrolled in several sections of edu- 
cational psychology by means of the fol- 
lowing procedure. 


The project presented herein was performed pursu- 
ant to a grant (NIE-G-74-0023) from the National In- 
stitute of Education, U.S. Department of Health, Ed- 
ucation, and Welfare to the first two authors. The 
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position or policy of the National Institute of Education, 
and no official endorsement by the National Institute 
of Education should be inferred 
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actual achievement) are greater on becoming the 


ne was initially the more competent 


After being pretested on course-related 
material, half of the volunteering class par- 
ticipants were told they would be randomly 
assigned to the role of tutor and the other 3 
half would be assigned the role of tutee for 
the first tutoring experience. In one third. 
of the same-sex pairs, the partners learned 
that the tutor’s pretest competence exceeded. 
the tutee’s, a strongly equitable or status- 
congruent arrangement. In another third 
of the pairs, the partners learned that their 
competencies were equal; and in the re- 
maining third, they learned that the tutor's 
competence was less than the tutee's. These 
arrangements were of weak and strong in- 
equity (status incongruence), respectively. 

During the first session, which was to 
cover half of the course-related material, wn 
tutors were trained and then released to 
tutor their respective partners. All partic- 
ipants then completed a posttest on that half 
and an attitudinal questionnaire. The 
posttest contained items that paralleled half 
of the items on the pretest. On the next 
class day, the participants in half of the pairs 
learned that the two partners would ex- 
change roles for the second session, while 
those in the remaining pairs learned that 


each partner would continue in the same 
role. Tutor training and tutoring on the 
remainder of the course-relevant material 
then took place, followed by another posttest 
and attitudinal assessment. The posttest 
consisted of items that paralleled the re- 
maining half of the pretest items. 

Two theoretical models were successfully 
, applied, one dealing with the first session 
“and the second dealing with the predicted 
changes in the outcomes from the first to the 
second session. This change model was the 
major focus. One expectation underlying 
the change model was that relatively greater 
changes in performance and satisfaction 
would occur following a change in role than 
a continuation of role, the direction of 
change depending on whether the change (if 
any) was to the more desirable role of tutor 
or to the less desirable role of tutee. This 
dimension of (change in) tutorial status was 
translated into several values, with a more 
positive value being assigned to a change 
from tutee to tutor (+3) than to continuation 
as tutor (+1) and a more negative value 
being assigned to a change from tutor to 
tutee (—3) than to continuation as tutee 
(-1). Evidence that a superordinate role is 
more satisfying than a subordinate role is 
found in Kardush’s (1968) experiment on 
status congruence. Furthermore, the as- 
sumption that an undesirable-to-desirable 
sequence is more satisfying than a desir- 
able-to-undesirable sequence is consonant 
with similar assumptions by others (cf. Ar- 
onson & Linder, 1965; Sampson, 1969). 
The second expectation underlying the 
change model was derived from equity 
theory (cf. Adams, 1965; Walster, Berscheid, 
& Walster, 1973) and related theorizing on 
status congruence (Sampson, 1969). Larger 
changes in the performance and satisfaction 
of both partners would follow from large 
changes than small changes in degree of eq- 
uity. The direction of change would depend 
on whether the change (if any) in equity 
status was toward greater or toward lesser 
equity. To represent this dimension of 
(change in) equity status, a more positive 
value was assigned for production (+2) than 
for maintenance (+1) of strong equity (i.e., 
for a more competent tutee becoming a tutor 
v than for a more competent tutor continuing 
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to be a tutor); a more negative value for 
production (—2) than for maintenance (—1) 
of strong inequity (i.e., for a more competent 
tutor becoming a tutee than for a less com- 
petent tutor continuing on as a tutor); and 
a zero value for the remaining situations, 
which involved partners of equal compe- 
tence. 

Since the dimension of change in tutorial 
status and the dimension of change in equity 
status were considered to operate in the 
same direction, the relevant “desirability” 
values described above were added across 
the two dimensions to generate a set of the- 
oretical weights. This meant, of course, that 
the more competent tutee who became a 
tutor received the most positive weight, and 
the more competent tutor who became a 
tutee received the most negative weight. 

According to this set of derived weights, 
the difference in weights is greater for tuto- 
rial status than for equity status. Conse- 
quently, the overall impact of tutorial status 
is expected to be greater than that of equity 
status. There is theoretical justification for 
this. Adams (1965) and Kardush (1968) 
acknowledge that it is less upsetting to be 
overcompensated than undercompensated. 
This follows directly from one proposition in 
equity theory that admits that while equity 
is preferred to inequity, maximization of own 
outcomes (becoming a tutor) is also pre- 
ferred to nonmaximization (becoming a 
tutee). : 

The weights thus generated were applied 
simultaneously as coefficients to all treat- 
ment means, using à multivariate extension 
(Bargmann, Note 1) of the method of con- 
trasts. Asin multivariate analysis of vari- 
ance, this overall planned comparison was 
followed by univariate applications of the 
method of contrasts (Hays, 1963; Winer, 
1971). 

The multivariate contrast was found to be 
highly significant in the appropriate direc- 
tion, as were most of the univariate analyses. 
Changes in perceived performance, actual 
performance, and role satisfaction were 
clearly influenced by changes in assigned 
role status and degree of (in)equity. 

Since most proponents of peer tutoring 
operate in a public school setting, how might 
such a proponent, who happened to favor 
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same-age peer tutoring, respond to such 
findings? Let us play the role of devil's ad- 
vocate and consider more closely the cir- 
cumstances surrounding those college 
studies. 

One basic question is whether the under- 

lying social psychological variables of equity 
or status congruence and of status level also 
have relevance in the public school setting. 
A partial answer to this is that young chil- 
dren are clearly responsive to issues of equity 
(cf. Lerner, 1974) and that countless so- 
ciometric studies attest to the pertinence of 
status level in the school-age peer group. 
We might counter that academic compe- 
tence may be of less critical concern as a 
status variable to the public school student 
than to the more self-selected undergrad- 
uate. A reply to this might stress that being 
put "in charge” is satisfying per se and that 
the public school student is probably aware 
of the achievement-related criteria that 
would be appropriate for having someone 
control someone else in learning a particular 
academic subject matter. To this, we might 
counter that in the last analysis, an empirical 
test of the generalizability of this theorizing 
was needed; but that in any case, the manner 
in which the college studies were carried out 
raises some methodological issues that need 
to be examined from the perspective of ex- 
ternal validity. 

For example, the college students knew 
they were to be in a brief experiment, that 
they could choose not to participate, that 
they would not learn their scores, and that 
the scores would not be factored into their 
course grades. Such a situation might have 
contributed to the operation of demand 
characteristics and generated little concern 
about the Consequences of failing to do well. 
In addition, while the subject matter was 
course-relevant, it was not embedded in the 
normal sequence of course Preparations, 
Moreover, the sample was relatively homo- 

geneous with regard to “real” competence, 
and the tutoring relationshi » which was 
relatively transitory (two sessions), involved 
comparative strangers, Finally, the sample 
was necessarily confined to Partners who 
were present on each occasion (thus resulting 
in substantial sample shrinkage), and the 
ages of paired classmates sometimes differed 
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by as much as 8 years (something that d 
not happen in the public school). The de- 
fense might insist that it would be difficult 
to derive the findings from the operation o 
demand characteristics, or a lack of concern 
with doing well, but might agree that a 
proper resolution of this debate called for a 
conceptual replication under circumstances. 
that more closely approximated the realiti 
of the public school classroom. 

To implement such a conceptual replica- 
tion in the public school, the previous re- 
search design was altered to allow, first of all, * 
for 4 weeks of daily tutoring, each week to be 
followed by performance and attitudinal 
assessments. The second revision was that 
all students were to exchange roles with their 
same-sex partners at the beginning of the 
third week and to continue in those new roles 
for the balance of the study. These two 
design changes permitted us to study how 
the same student responds both to phases of 
role stability and to role exchange, without 
locking any given student into a role that he 
or she might initially regard as undesirable, 
and to lessen the extent to which the impact 
of a given role might progressively attenuate 
over the 4 weeks. The specific operational 
changes that were made with respect to the 
methodological issues raised above (e.g, 
subject attrition, “report card” relevance of 
learning the material, embeddedness of the 
study in the curriculum sequence, real 
competence, interpersonal acquaintance, 
and age differences) are dealt with later. 

__ Since the experimental design to be used 
in the public school studies differed from 
that of the college studies, it was also neces- 
sary to adapt the old pair of theoretical 
models to make them more appropriate for 
the new design. One of these was a rela- 
tively simple 6-weight model of global 
change, which accounted for the changes, 
between Weeks 1 and 2 combined and 
Weeks 3 and 4 combined. The other was a 
more elaborate 18-weight model of phase 
change, which dealt with changes between 
Successive weeks (i.e., between Weeks 1 and 
2, Weeks 2 and 3, and Weeks 3 and 4). The 
theoretical weights for each change model 
are shown in Table 1. The method of con- 
structing them, which follows the same logic 
as that employed in constructing the models 
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Table 1 
Adjusted Theoretical Weights Used as 
Coefficients in Testing the Predictive Value 
of the Two Change Models 


Initial tutor’s competence 
relative to tutee’s 
Phase* Greater Equal Lesser 
Tutor to tutee 
f ‘Phase 1 +1.5 T5 -5 
Phase 2 -6.5 -3.5 -5 
Phase 3 -3.5 -1.5 +.5 


Global change” -8.5 —4.5 -5 


y _ Global change? _ -99 < -> —— —— 


Tutee to tutor 


Phase 1 +5 -5 -15 
Phase 2 T5 +3.5 +6.5 
Phase 3 -5 +1.5 *3.5 
F — Global change" 45 +45 +85 


a Phase 1 represents the Week 2 — Week 1 difference, Phase 2 
represents the Week 3 — Week 2 difference, and Phase 3 rep- 
resents the Week 4 — Week 3 difference. 

b These global change weights represent the difference between 
Weeks 3 and 4 combined and Weeks 1 and 2 combined and can 
be calculated by adding the relevant phase weights. 


used for the college studies, is provided in 
Footnote 1.! 


Method 


ra 


Subjects 


_For the first of these public school studies, 104 

sixth-grade students in four classes of mathematics were 

^ recruited in the spring of the school year. Five pairs 

were lost due to attrition, leaving 47 same-sex pairs who 

had completed all phases and filled out all instruments. 

Similarly, for the second study, which was conducted 

in the fall of the school year, 134 sixth- and eighth-grade 

mathematics students were recruited; after attrition, 

60 pairs were left for analysis. Thus, sample attrition 

i pe relatively more minor than in the college sam- 
ples. 

In order to accomplish the "relative competence" 
manipulation, students were stratified so that within 
each of the eight classrooms, their prior achievement in 

^" mathematics reflected high, average, or low competence. 
Actual prior grades and standardized math achievement 
scores were used for this purpose- 

Within each of the strata, same-sex students were 
paired so that one third of the pairs were of similar 
previous achievement levels, while two thirds had dis- 
similar prior achievement levels. The dissimilarity was 

kept small (i.e., within stratum) so that the disparities 
would not be unduly upsetting to the participants in 
those conditions. Within each tutor-tutee pair, one 
partner was randomly assigned the initial role of the 
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tutor, and the other was assigned to that of the tutee; 
thus, within the unequal conditions, half of the pairs 
had a tutor who was higher achieving than the tutee, 
while the converse was the case in the remaining half. 

The sequence of events in the 4-week study pro- 
ceeded as follows: On the first day (Monday), each 
class was introduced by the teacher to the notion of peer 
tutoring; then, each student was given a note naming his 
or her partner, their respective roles, and an indication 
of their relative math competencies to date (namely, as 
having performed “higher than,” “about the same as," 
or “lower than" their partner). 

Each Monday, the students were told what their roles 
would be for the current week. In fact, the roles were 
exchanged only on the third Monday, and the initial 
relative competence of the partners was made salient 


In order to insure that space was available for both 
tutor training and control activities, two classes were run 
during a given daily class period. Onall but the last day 


prepared for that day; while for the same period of time, 


1 To account for changes between successive weeks 
of tutoring, a phase change model was constructed. 
Change in tutorial status is represented in the tutor- 
to-tutee sequence by desirability values of +1, —3, and 
—1, respectively, for the Phase 1 (Week 2 — Week 1) 
difference of continuing to be a tutor, the Phase 2 (Week 
3 — Week 2) difference of then becoming a tutee, and 
the Phase 3 (Week 4 — Week 3) difference of then con- 
tinuing as a tutee. Complementary values (-1, +3, and 
+1) are used for the tutee-tutor sequence. Change in 
equity status is represented by values of +1, —3, and —2, 
respectively, for continuing under strong equity (Phase 
1), then changing to strong inequity (Phase 2), then 
continuing under strong inequity (Phase 3). Comple- 
mentary values (—1, +3, and +2) are used for the op- 


trasts, the negligible constant of +.5 was added to all 
weights in the tutee-tutor sequence, and -5 was added 
to all weights in the tutor-tutee sequence In the final 


combined, entails giving values of —4 for being tutor 
then tutee and +4 for being tutee then tutor. (The 
figures used included the —.5 and T5 constants.) 
Concerning equity, values of —4, +4, and 0 are given to 
represent, respectively, a change from strong equity to 
strong inequity, strong inequity to strong equity, and 
all changes involving weak inequity. As above, sum- 
mation produces the 6-weight. global change model (see 
Table 1). Each weight is equivalent to the algebraic 


sum of the three relevant phase weights of the phase 
model. 
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prospective tutees were drilled in another room on un- 
related mathematics materials by a classroom teacher, 
who might or might not have been their own math 
teacher. This involvement for the tutees was designed 
to attenuate possible differential Hawthorne effects. In 
the remaining 30-35 minutes, the class was reassembled 
and the dyadic tutoring took place in the presence of the 
experimenter and one or both teachers. On the last day 
of each week, the tutors were given review materials to 
guide their tutees' (and their own) preparation for the 
weekly test, the results of which were shared with the 
teacher but not with the student. Following adminis- 
tration of this weekly test to all participants, an at- 
titudinal questionnaire was administered to all as well. 
It consisted of (mostly) 6-point rating scales dealing 
with perceptions of the student's own performance and, 
in the area of satisfaction, dealing with feelings of en- 
joyment, importance, usefulness, being the boss, pride, 
desire to switch jobs (5-point scale), and feelings of 
owing the partner, that is, indebtedness. The end- 
points of the 6-point scales were labeled by words such 
as proud and not proud; in addition, each scale was 
represented pictorially by a graded series of smiling to 
frowning faces. We had encountered mixed results in 
the past with rated enjoyment and rated feelings of in- 
debtedness, perhaps because the measures include too 
much “semantic noise.” For example, in the case of 
indebtedness, we found some early evidence that it had 
positive connotations for some participants. Never- 
theless, we retained these scales in the interest of rep- 
lication. 

Some further comments are in order on the psycho- 
metric properties of the nine measures used as depen- 
dent variables, particularly since most of them involve 
rating scales of limited range. Concerning reliability, 
the item stability correlations between Weeks 1 and 2, 
i.e., for Phase 1, ranged from +.37 to +.66, the average 
r being +.52. Reliabilities ranged from —.40 to +.48 
between Weeks 2 and 3, i.e., during Phase 2, with an 
average r of +.15. These lower magnitudes are to be 
expected in view of the role exchange in Phase 2. The 
stability correlations between Weeks 3 and 4, ie. during 
Phase 3, ranged from +.36 to +.74, the average r being 
+.61. Since, for a df of 212, an r of .225 would be sig- 
nificant at the .001 level, the items clearly show reli- 
ability when they are expected to. 


Internal consistency of the items is shown by the fact _ 


that the average intercorrelations among the nine items 
were +.24, +.28, +.27, and +.37 for the 4 successive 
weeks. While low, they are significantly positive. The 
construct validity of such items was, of course, shown 
by the significant univariate and multivariate contrasts 
found in the college studies. Their construct validity 
in the present studies is what is-being tested. 
Differences in the procedures of the two public school 
studies, although they did not produce significant dif- 
ferences in results, should be noted. First, the end- 
Points on several of the affective assessment scales were 
adjusted slightly in the second study to reduce possible 
floor or ceiling effects. Another change involved using 
achievement test items from teacher manuals and from 
standardized tests rather than creating test items, which 
had been the case in the first study. Finally, the 
sixth-grade students in the first study were learning 
fourth-, fifth-, or sixth-grade materials, whereas the 
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second study included sixth- and eighth-grade students 
learning material at their appropriate grade levels, In 
either case, our materials were based on the same text 
pages that the teachers were to cover in that 4-week 
period. 

Because of these procedural and sample differences 
between the two studies, the data of each study were 
standardized relative to that study. To assess whether 
the two studies produced comparable results and would 
justify making one rather than two tests of each model, 
multivariate analyses of variance and univariate anal- 
yses of variance were conducted on the standardized ' 
data, with study (first vs. second) treated as one of the 
independent variables. None of the multivariate 
analyses of variance on the interactions of the study 
factor with the experimental variables even approached 
significance. Of the nine univariate analyses, there was 
one significant interaction effect, which was, however, 
ordinal. Specifically, on the rating of proud, the 
changes as a function of role sequence were in the pre- 
dicted direction; however, they were stronger in the 
second than in the first study. In sum, these prelimi- 
nary analyses justified combining the standardized data 
of both studies for the crucial contrast analyses. 


Results 


Two multivariate contrast analyses were 
conducted. The conventional multivariate 
analyses of variance and univariate analyses 
of variance alluded to above provided the 
error components for these contrast analy- 
ses. For the first multivariate contrast 
analysis, the 18 theoretical weights of the 
phase change model were applied as coeffi- ^ 
cients to the 18 respective treatment means. 
In orde involve all of the treatment 
means in a manner that did not violate the 
statistical requirement that the weights sum 
to zero, a relatively negligible constant of +.5 
was added to every treatment coefficient 
involved in the tutee-to-tutor sequence, and 
a complementary constant of —.5 was added 
to every treatment coefficient in the tutor- 
to-tutee sequence. For the second multi- 
variate contrast, the six theoretical weights 
of the global change model were applied as * 
coefficients to the respective six treatment 
means and similarly adjusted by the negli- 
gible constant of +.5 or —.5 (see Footnote 1 
and Table 1). 

_The phase change model test yielded a 
highly significant result, F(9, 346) = 15.57, 
D X.001. Similarly, the global change test 
was highly significant, F(9, 200) = 18.24, P 
< .001. By way of illustration, global and 
phase change means are shown in Table 2 for 
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able 2 
eans and Standard Deviations of 
andardized Phase Analysis and Global 
hange Scores on the Desire Not to 
witch Jobs 


Initial tutor's competence 
relative to tutee’s 


Greater Equal Lesser 

Tutor to tutee 
—.08 —.25 —.18 
M 95 98 
-147 —12 —.35 
1.61 1.54 151 
4.02 4.28 =ll 
13 82 1.07 

Global change” 

M —2.39 —140 —1.00 
2.90 2.26 2.83 


SD 
Tutee to tutor 


Phase 1 
M 4.26 —.06 3.30 
SD 87 1.01 83 
Phase 2 
M +.63 +.59 4.89 
SD 144 147 147 
Phase 3 
M -.21 4.03 +.04 
SD 87 93 AT 
Global change” 
M +1.25 +1.16 4242 
SD 2.70 2.38 2.62 
Bes More positive scores indicate greater desire not to switch 
jobs. 


^ Phase 1 represents the Week 2 — Week 1 difference, Phase 2 
represents the Week 3 — Week 2 difference, and Phase 3 rep- 
sents the Week 4 — Week 3 difference. 
Global change means represent the difference between the 
ombined scores of the last 2 weeks (Weeks 3 and 4) and the 
‘combined scores of the first 2 weeks (Weeks 1 and 2). 


he desire to switch jobs. The univariate Fs 
* for both analyses are cited in Table 3, while 
all global change means are shown in Ta- 
ble 4. 

Inspection of the univariate F values (see 
Table 3) suggests that in terms of order of 
magnitude, the relative contributions of the 
various dependent variables to each over 
change analysis were quite similar. Thus, 
the four measures that concerned a student's 
_ own role evaluations, namely, feeling im- 

portant, useful, proud, and like the boss (our 
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junior version of “powerful”), while in the 
role, showed highly significant effects in each 
analysis. So, too, did the measure of par- 
ticipants’ own desire (not) to switch jobs 
with the partner. It was not surprising to 
find that once again ratings of own enjoy- 
ment (a measure we had long before come to 
feel was too blatant) and felt indebtedness 
(i.e., owing the partner something) failed to 
contribute significantly to either change 
analysis. 

Both models successfully predicted how 
participants’ evaluations of their own test 
performance would change. Only the global 
change model succeeded in predicting 
changes in actual test performance. It 
should be noted that the overall changes in 
actual test achievement were significantly 
correlated with changes in perceived own 
performance (F= +.29, p < 001). In this 
connection, the correlations between per- 
ceived own performance and actual 


achievement on the four successive weekly 
assessments were 4.23, 


4.44, +.23, and +.28, 
respectively (F = +.30, p < .001). Although 
changes in actual achievement were not 
significantly correlated, on the average, with 
the various satisfaction measures 
changes in perceived perfor- 
mance were significantly associated, on the 
average, with changes in satisfaction (F = 


to find, on examining 
for role sequence, that 
(felt indebtedness), 
Week 2 


tual, while the tutees who 


perceived and ac i s 
the predicted incre- 


became tutors showed 


ments. 
In addition to the univariate Fs, Table 3 


also lists two sets of correlations. Each 
correlation shows in more conservative 
fashion, but more graphically than does the 
corresponding univariate contrast F, how 
well the order of magnitude of the obtained 
treatment means on the measure in question 
approximated the order called for by the 
adjusted theoretical weights of the two 
models. Thers simply consist of the prod- 
uct-moment correlation between the 6 and 
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ivari 7 he 
Univariate Contrast Analyses of Global and Phase Change and Correlations Between t. 
Adjusted Theoretical Weights of Each Model and the Obtained Treatment Means 
Global change Phase change - 
Dependent measure r(4) F(1, 208)# p r(16)  F(1,346)b p 
1. Perceived own test 


performance “68 
2. Own enjoyment .29 
3. Important -16 
4. Useful 82 
5. Being the boss -85 
6. Proud .84 
7. Own desire (not) to 

Switch jobs .93 


8. Own felt indebtedness 44 


8.36 01 58 5.02 .025 
«1 ns .26 1.01 ns 
36.29 .001 A8 31.29 001 
22.93 -001 TI 18.89 001 
106.81 001 NE 104.15 001 
23.22 -001 .68 17.07 -001 
72.28 -001 74 61.93 001 


9. Actual test performance .55 4.51 .05 .22 «1 ns 
^ For the multivariate contrast analysis, F(9, 200) = 18.24, p « .001. 


^ For the multivariate contrast. analysis, F(9, 346) = 15.57, p< 


18 respective weights of the models and the 
6 or 18 appropriate cell means. Note that 
respective rs of .73 and .40 are needed to 
satisfy the .05 significance level, using a 
one-tailed test. The Fs are, of course, in- 
sensitive to direction, whereas the correla- 
tions are not. Consequently, a one-tailed 
test is appropriately applied to the above rs, 
By this yardstick, five of the nine global 
change rs satisfy the .05 level of significance, 
while six of the nine phase change rs reach 
l should occasion 
no surprise that the magnitude of the Fs and 
are themselves 


present results, this 
out the Possibility that other orthogonal 

comparisons might also have been signifi- 

cant. In orderto determine whether, in fact, 

any substantial portion of the variance (ex- 

clusive of error) was still unaccounted for, 

multivariate and univariate residuals Were 

computed on the global change data, The 

multivariate residual contrast turned out to 

be highly significant, x*(36) = 76.10, p < 

-001. Six of the univariate residuals were 

significant, namely, feeling important, F(4,. 
208) — 6.10, D < .001; useful, F(4, 208) = 

2.61, p « .05; like the boss, F(4, 208) = 10.87, 
P < .001; proud, Fi (4, 208) = 2.45, p < .05; 


.001. 


desire to switch jobs, F(4, 208) = 2.61, p < 
-05; and actual test performance, F(4, 208) 
= 2.62, p <.05. Because the more complex 
Phase change model involved both between 
and within components, calculation of re- 
sidual contrasts on the phase change data 
seemed too indefensible an undertaking. 
Inspection of the outcomes in the two sets 
of univariate tests (see Table 3) reveals 
smaller Fs on nearly every measure used in 
the phase change analysis than in the global 
change analysis. This is not surprising. 
While the phase model is associated with 
considerably more degrees of freedom, it 


imposes a more exacting set of requirements | 


on the data. Since the phase change model 
calls for least change in Phase 1 and most 
change in Phase 2, changes of small magni- 
tude are likely to be obscured by the greater 
unreliability that is introduced by using the 
more discrete, that is, the more circum- 
Scribed, temporal units of the phase change 
approach than the global change approach. 
This problem would also manifest itself as a 
greater restriction of range in correlations 
pertaining to Phases 1 and 3 than to Phase 
2 and, Consequently, as relatively smaller 
Intercorrelations in Phases 1 and 3. 

To illustrate, examination of the within- 
phase intercorrelations of change in the case 
of the four role evaluation measures (im- 
Portant, useful, boss, and proud) reveals 


COMPETENCE AND TUTORIAL ROLE AS STATUS VARIABLES 


able 4 


'eeks 3 and 4 Combin: 
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ans and Standard Deviations of Standardized Global Change Scores Based on 107 Dyads 
ed Minus Weeks 1 and 2 Combined) 


Initial tutor's competence 
relative to tutee’s 
Dependent measure Greater Equal Lesser 
and role sequence M SD M SD M SD 
L Perceived own test performance 
Tutor to tutee —.65 1.54 —46 1.67 -21 217 
Tutee to tutor 63 2.08 NI 1.37 07 1.81 
L Own enjoyment 
Tutor to tutee —.29 1.90 08 2.16 —.57 1.91 
| Tutee to tutor 69 2.08 A7 1.87 .03 2.06 
. Important 
Tutor to tutee —1.37 2.18 —1.08 235 —.82 1.95 
Tutee to tutor 1.53 2.44 Mi) 1.84 .93 195 
. Useful 
Tutor to tutee -1.07 2.13 —.52 1.85 —.66 2.10 
Tutee to tutor 91 1.84 .56 1.93 425 1.75 
. Being the boss 
Tutor to tutee -211 2.24 —1.66 2.02 —1.84 241 
"Tutee to tutor 178 2.22 1.60 2.24 246 2.27 
6, Proud 
-. Tutor to tutee —1.29 1.98 —.24 1.83 —.73 2.34 
Tutee to tutor 81 2.04 44 2.03 .95 2.12 
7. Own desire (not) to switch jobs : 
Tutor to tutee —2.39 2.90 —140 2.26 —1.00 2,83 
Tutee to tutor 1.25 2.70 1.16 2.38 242 2.62 
8. Own felt indebtedness à 
"Tutor to tutee .00 1.88 =21 1.30 —.24 ri 
Tutee to tutor i! R59 b4 — L80 9  dM5 , 
|. Actual test performance 
Tutor to tutee 00 — 161 —56 163 -56 197 
Tutee to tutor lL 1.57 12 141 26 1,92 


smaller average correlations in Phase 1 (r = 
+.26) and Phase 3 (F = +.32) than in Phase 

(F = +.47). In other words, the relatively 

eater average intercorrelation in Phase 2 

an in the other two phases could be due 

artly to what the theory calls for as a con- 
equence of role exchange and partly to the 
latively greater unreliability of change 
ores in Phases 1 and 3. That unreliability 
‘ould contribute to error variance. 

There is a substantive issue that also 
needs to be examined in connection with the 
phase change model. According to that 
theoretical model, greater stability is called 
for in Phase 1 than in Phase 3. Recall, 
however, that item stability was greater on 
the average in Phase 3 (r = +.61) than in 
Phase 1 (F = +.52). This was the case with 
all but one of the variables, namely, per- 
ceived importance. In fact, the respective 


® 


stability correlations for actual test perfor- 
mance were +.37, +.29, and +.58 for Phases 
1, 2, and 3, respectively. This reversal of 
direction from what the phase change model 
called for in the first and third phases 
probably reduced its predictive effective- 
ness, at least with respect to the data at 
hand. A possible explanation for this re- 
versal is discussed below. 


Discussion 


It will be recalled that certain change 
models, which had proven successful for 
predicting dyadic peer-tutoring changes in 
college-level classmates, were adapted to the 
public school setting for purposes of con- 
ceptual replication and generalizability. 
The results obtained in using these newer 
models, the relatively elaborate phase 
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change model and the more abbreviated 
global change model, for predicting the 
outcomes of peer tutoring in the public 
school milieu fully justified our efforts. 
They indicate that changes in status level 
and status equity (status congruence) are 
important determinants of task-relevant and 
morale-relevant aspects of the tutoring ex- 
perience, 

As predicted, once tutors were made to 
exchange roles with their tutees, these for- 
mer tutors exhibited a decline in perceived 
performance and satisfaction. By way of 
contrast, their former tutees, now tutors, 
showed an increase in these variables. What 
makes these results particularly exciting is 
that the combinations of status level and 
status equity exerted their effects, even 
though the tutoring was in a subject matter 
that probably lacked intrinsic appeal for 
many of the participants, namely, mathe- 
matics. This vindicates our initial suppo- 
sition that the symbolic value of status may 
often transcend its Perceived instrumental 
value (i.e., the value of learning math) in the 
short run and, at least, leaves 
sibility that in the long haul the motivation 
of the former 

It was also 


namely, 
theoretical reasons) of initia] relative com- 
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tions of their own relative performance were 
relatively accurate, which they were, this too. 
would have served to magnify both sati 
faction and actual achievement differences, 
Both such procedural modifications woul 
increase the likelihood that changes in sat- 
isfaction and actual performance operate in 
tandem. 

Needless to say, 


4 


questions of absolute 

levels of attainment in mathematics through 
tutoring were not addressed through the 
present designs. To pursue such questions, 
these designs would need to be expanded to '! 
include control peers, who received equal but. 
more traditional exposure to the materials 
used in tutoring. 

It is not too surprising that the relatively 
exacting model of phase change, although it 
clearly proved to have predictive value, was 
not quite as successful as the global change 
model. First of all, the phase change model, 
which incorporates much finer nuances rel- 
ative to changes in Status equity than does 
the global change model, would be especially 
vulnerable to a weak manipulation of rela- 
tive initial competence. In addition, while 
the phase change model predicts greater 
stability in the first than in the third (final) 
correlations were found 
to be greater in the third than in the first 
reason for this reversal is 
that the participants were unfamiliar with 


because they were introduced to those in- 
struments for th 
first week. This 


hile the models were shown to have ff 
Predictive value, the Presence of significant 
residuals indicates that the models do not 


be seen by inspecting the 
global change means in Table 4, that while 


the decrements were especially marked by 
d large when a more competent partner 


ho had been the tutor was made the tutee, 


he greatest increments were not necessarily 
own by the superior tutee who became a 
tor. While the extreme responses of the 
superior former tutor are completely in ac- 


cord with the theory, the less-than-extreme 


responses of the superior former tutee are 
not. Why this asymmetry occurred is not 
clear. Still, these discrepancies do not nec- 
essarily call for a modification in the theo- 
retical models or for their replacement by 
new ones. A more judicious approach 
would first entail further replication. The 
replication should involve a stronger ma- 
nipulation of relative competence and an 
attempt to familiarize the participants with 
the instruments before the experiment be- 
gins. Such methodological alterations 
would provide a more optimal test of the 
present models. 

Along with such alterations, it might be 
extremely useful to observe what the tutor- 
ing process is like in these dyads. It may 
turn out that the process varies as a function 
of whether the dyad was equitably consti- 
tuted and that this differential process, in 
turn, has a meaningful impact on the out- 
come variables. 

This latter point might be stated more 
generally. Although our focus has been on 
structural variables, this is not to deny the 
importance of also investigating, through 
direct observation, the relatively neglected 
research area (Feldman, Devin-Sheehan, & 
Allen, 1976) of the dynamics of peer-tutoring 
interaction. The objective of such an ob- 
servational undertaking would be to deter- 
mine how structural factors, such as group 
composition, and task variables, such as 
Subject matter, affect the nature and quality 
of the tutoring process, and how this process 
in turn influences actual achievement and 
morale. 

Another issue that needs to be explored is 
whether status considerations continue to 
remain salient for long-term participants in 
same-age peer tutoring. While the present 
studies were carried out over a longer peri 
of time than were the college studies, it re- 
mains to be seen whether the motivational 
value of (changes in) status conditions would 
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continue or attenuate if the tutoring were to 
continue. 

A related issue for which no systematic 
data exist concerns the effects on both per- 
formance and satisfaction of reconstituting 
each tutorial dyad. In keeping with the 
present focus on the structural features of 
peer tutoring, the comparisons of interest are 
those between dyads that undergo different 
kinds of changes in one or more of the major 
parameters, that is, role change, partner 
change, and change in the relative initial 
competence of the new partners from that of 
the old. For instance, partners and/or roles 
might be traded. Furthermore, partners of 
equal (unequal) relative competence might 
obtain new partners again of equal (unequal) 
competence, or they might now obtain new 
partners of unequal (equal) competence, and 
vice versa. Comparisons among such con- 
ditions would provide a partial answer to the 
question of what form(s) of same-age tutorial 
pairing is optimal for both performance and 
morale. 

Further light on the issue of optimal 
pairing might, perhaps, also be cast by in- 
vestigating whether the type of set under 
which dyad partners operate affects the 
differential outcomes of the partners and the 
outcomes of the dyad considered as a unit. 
In the present studies, relations within the 
dyad were probably made salient, an em- 
phasis that may well have contributed to the 
invidious comparisons and induced indivi- 
dualistic, if not competitive, orientations. It 
may well be, however, that such comparisons 
and orientations would have been atten- 
uated and replaced by a set of promotive 
interdependence with beneficial results if the 
tutorial partners had been led to believe that 


2 One alternative model that clearly would fail is that 
which simply calls for a preference for equality in out- 
comes (cf. Lerner, 1974). Translated into the present 
context, such a model would imply that both parties in 
the tutoring dyad would wish to share to an equal extent 
the more desirable role of tutor and the less desirable 
role of tutee, regardless of relative competence. Since 
role exchange brings about an equal sharing of both 
roles, such an equality model would call for former tu- 
tors who became tutees to be no less satisfied, subse- 
quent to the exchange, than their former tutees who 
became tutors. This clearly did not happen: While the 
former tutees became more content, the former tutors 
became less content. 
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the objective was to see how well the dyad 
performed in competition with other dyads 
in the classroom, or if each participant were 
led to believe that he or she was competing 
with same-role (and/or same-ability) par- 
ticipants in other dyads. 

In addition to the induction of a set that 
stresses cooperation within the tutoring dyad 
but competition between dyads, it would also 
be fruitful to investigate both the individual 
and dyadic effects of introducing interim 
feedback concerning the previous perfor- 
mance of partners and dyads relative to one 
another. To this end, some of the inter- 
esting features of the “teams-games-tour- 
nament” approach (DeVries & Edwards, 

1974; DeVries & Slavin, Note 2) might be 
incorporated. 

Up to this point, we have not dealt with 
the concerns of the school practitioner. 
Since many, if not most, educators would 
consider the main goal of peer tutoring to be 
improved learning, might not the attitudinal 
or affective changes that these experiments 
have demonstrated seem tangential or ex- 
traneous to that goal? Our position is that 
while classroom satisfaction and achieve- 
ment are not interchangeable products, they 
apparently do function in some interde- 
pendent fashion. While their precise causal 
interconnections have yet to be determined, 
it is safe to say that the practitioner may well 
find signs of change in one to be symptom- 
atic or predictive of change in the other. 

_ This still leaves open the practical ques- 
tion of how to pair classmates for tutoring. 
Granted, systematic investigation is still 
needed on precisely how much differential 
change in both achievement and satisfaction 
is brought about by pairing classmates of 
different degrees of competence. At this 
juncture, the practitioner who is intent on 
optimizing the joint gains of the classroom 
partners comprising the average tutoring 
dyad can proceed with confidence on the 

assumption that role exchange partway 
through the sequence of tutoring activities 
is a reliable way of assuring such gains, since 
it operates to balance off, over the long haul, 
the change increments in achievement and 
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satisfaction of one partner relative to those 
of the other partner. 
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Category Relations and Syllogistic Reasoning 


Russell Revlin, Karl Ammerman, Kirk Petersen, and Von Leirer 
University of California, Santa Barbara 


The present article evaluates the conversion model of formal reasoning for its 
ability to predict the decisions made by reasoners when solving concrete and 


abstract syllogisms. 


Four groups of students were asked to solve two sets of 


categorical syllogisms. The groups differed in the kind of inclusion relation 


expressed by the premises on the first set 


of problems (e.g., abstract, arbitrary, 


convertible, or unconvertible). In the second set, all groups solved identical 


abstract syllogisms. 


"The reasoners' decisions were compared with the predic- 


tions of the conversion model and support the model's contentions that (a) 
reasoners’ decisions reflect natural language processes in the encoding of syllo- 


gistic premises and (b) reasoners’ decisi 


ions appear to follow rationally from 


their understanding of the problem’s materials. Pedagogic implications of 


these findings are considered. 


Categorical syllogisms have been used 
for centuries as a standard against which 
human rationality could be assessed, but 
they present us with a psychological anom- 
aly: Otherwise rational students appear to 
reason irrationally on such problems. For 
example, consider the following: 


1, All P are M. 


^ A Some M are S. 
On these types of problems, the reasoner's 
task is to determine if a single, unambiguous 
- relation holds between S and P, given the 
information in the premises (i.e., P is rela! 
a to M, and M is related to S). In the case of 
j Syllogism 1, as many as 8096 of college stu- 
T dents (Revlis, 1975b) claim that the pro- 
positional conclusion Some S are P neces- 
sarily follows from the premises. That such 
a conclusion does not necessarily follow 
1 should be clear from Syllogism 2, which has 
X the same logical form but with a more 
- transparent content. 
EB —— — 
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2. All horses are animals. 
Some animals are flying creatures. 


Clearly, in this example, few would conclude 
that Some flying creatures are horses, al- 
though this is the same type of decision 
commonly reached in Syllogism 1. 
Although the difficulties reasoners en- 
counter while solving syllogisms are well 
documented (e.g., Wilkins, 1928), only re- 
cently have process models of reasoning been 
proposed to account for error data (e.g., Er- 
ickson, 1974, 1978; Revlis, 1975a, 1975b). 
While previous hypotheses concerning 
sources of error in the solution of categorical 
syllogisms (e£. Chapman & Chapman, 1959; 
Woodworth & Sells, 1935) received wide 
currency, they were insufficiently specified 
to permit a strong test (see Revlis, 1975b). 
Accounting for reasoners’ decisions 1s par- 
ticularly important, since syllogisms have a 
long history of use on general intelligence 
tests (e.g. Guilford, 1959; Thurstone, 1938) 
and in investigations of memory processes 
(e.g., Frase; 1966; Whimbey & Ryan, 1969; 
Erickson, Note 1) Attempts to improve 
students' reasoning accuracy through 
training programs (e.g., Dickstein, 1975; 
Henle & Michael, 1956; Simpson & Johnson, 
1966; Whimbey & Ryan, 1969) have also 
produced equivocal results. Tt is difficult to 
account for differential training outcomes in 
the absence of a model of syllogistic rea- 
soning that predicts both reasoning errors 
and reasoning successes. It is the purpose 
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of the present article to describe and test 
such a candidate model of categorical rea- 
soning, called the conversion model (Revlis, 
1975b), and to consider its theoretic and 
pedagogic implications. 


Conversion Model 


The conversion model challenges the 
prevailing view that human reasoners are 
faulty information processors (cf. Frase, 
1966; Simpson & Johnson, 1966). The 
model asserts that errors are attributable to 
students' understanding of the meaning of 
the sentences in thesyllogism: When their 
understanding of the sentences is taken into 
account, their decisions are both predictable 
and rational, 

The conversion model is based on the view 
that reasoning errors result primarily from 
the way in which syllogistic premises are 
encoded and not from a faulty inference 
mechanism. The major source of error in 
encoding is said to be illicit conversion. 
‘That is, when the reasoner is told that AllA 
are B, he or she interprets this proposition 
to mean that the converse All B are A is also 

true. Illicit conversion as a source of errors 
in syllogistic reasoning was suggested by 
Chapman and Chapman (1959) and has been 
embodied in a formal, testable model by 
Revlis (1975a, 1975b). The importance of 
conversion for categorical inference is illus- 


trated by comparing Syllogisms 1 and 3, 
1, All Pare M, 3. All M are P. 
Some M are S, Some S are M. 


If a student were reasoning logically on 
Syllogism 1, the student would claim that no 
valid conclusion is possible. However, if 
while encoding the premises, the reasoner's 
understanding of the Sentences was such 
that he or she converted each one in turn, the 
problem would appear as Syllogism 3. This 
converted syllogism does have a solution, 
that is, Some S are P, which is just the con- 
clusion that reasoners reach when shown 
Syllogism 1. As a result of conversion, a new 
problem is cee with a conclusion that 
is Inappropriate for the original syll ‘ism. 

Revlis (1975a, 1975b) Soma s od 
of reasoning with conversion as well as other 
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rocessing mechanisms as its core. Thi 
dicem p * comm to be effective in 
predicting the reasoner's decisions on bo 
abstract problems (where letters represent 
categories) and concrete ones (where real. 
world categories are used). It specifies 
precisely those conditions under which 
knowledge of the categories reasoned about 
will affect (both positively and negatively) 
the deductions reached. The model claims 
that students will make reasoning errors 
primarily in cases where encoding the 
premises entails conversion of the proble L 
to one in which a different conclusion is re- 
quired. The model predicts that students 
will be correct in their judgments in either 
two conditions: (a) when the problem is 
converted, but the conclusion is fortuitousl r 
the same in the converted and original for 
of the problem; and (b) when the reasoner’s 


There is a second source of reasoning er- 
rors described by the conversion model that 
is not directly related to the encoding pro- 
cess. The model claims that students es- 
chew indeterminate conclusions (i.e., “none 
of the above"-type answers). When faced 
with having to reach such conclusions under 
the severe time constraints of the present. 
paradigm, reasoners are said to randomly 
select a conclusion from the available set. 
This introduces an irrational component to | 
the decision process on a specifiable subset 
of syllogisms. 


Processing Stages 


The conversion model is shown in Figure 
1. The model consists of four processing 
Stages: (a) an encoding stage, in which the 
individual premises are assigned a semantic 
reading; (b) a composite stage, in which the 


SYLLOGISTIC REASONING 


? (tor i = 1 to j) 
PREMISE 
ENCODE: 


STACK Premise; 


CONSTRUCT: 


(for all stacks) 


COMP CONCATENATE : 


STACK Levels 


CONSTRUCT: 


(k= 1) 


CONCL 


STACK ENCODE: 


C 
onclusion, 


CONSTRUCT: 


ACCEPT: 
Conclusion, 


Figure 1. The conversion model 


readings given to the individual premises are 
concatenated to produce a single predicate 
representing the reasoner's understanding 
of the problem; (c) a conclusion encoding 
Stage, which is similar to the premise en- 
coding stage; and (d) a comparison stage, in 
which the composite predicate is compared 
with the candidate conclusion. This last 
stage also includes a decision substage in 
which the reasoner selects the response. 
When time or the reasoner’s motivation is 
exhausted (which clearly varies with the in- 
dividual), the reasoner makes a fair guess 
from among the available conclusions. 


4 The PASS and GUESS mechanisms pre- 


Meaning Stack 


Corresponding 


Meaning Stack 


Meaning Stack 
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l of formal reasoning (after Revlin & Leirer, 1978). 


suppose that people are biased against ac- 
cepting a "none of the above"-type conclu- 
sion. The plausibility of this assumption 1s 
well recognized (e.g., Chapman & Chapman, 
1959; Dickstein, 1975; Revlis, 1975b; but see 
also Dickstein, 1976). 

The conversion model is sufficiently de- 
tailed that it specifies the decision that 
should be reached on every syllogism. This 
contrasts with earlier work (e.g. Chapman 
& Chapman, 1959) that was largely de- 
scriptive in nature, in which the predicting 
of decisions was neither possible nor appro- 
priate. While only the general predictions 
of the conversion model will be the major 
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concern here, more detailed predictions and 
their accuracy can be obtained elsewhere 
(Revlis, 1975a; Revlin & Leirer, 1978). 


Valid Syllogisms 


Valid syllogisms are ones where a single 

conclusion unambiguously follows from the 
two premises. The conversion model dis- 
tinguishes several types of valid syllogisms, 
two of which are pertinent here: (a) those 
where conversion results in a syllogism that 
has the same conclusion as the presented 
problem—these are called SAMES; and (b) 
those where conversion produces a syllogism 
with a different conclusion than the one 
presented—these are called DIFFERENTS. 
On both types of valid problems, the reaso- 
ner will find a match between the encoding 
of the premises and one of the conclusions 
offered by the experimenter. The reasoner 
will accept this matching conclusion as ap- 
propriate. The decisions reached on SAMES 
will be scored as correct, while decisions 
reached on DIFFERENTS will be scored as 
incorrect. 


Invalid Syllogisms 


Invalid syllogisms are problems where no 
conclusion unambiguously follows from the 
premises. On these problems, the logically 
required decision is that “no conclusion is 
proven" (i.e., “none of the above”). The 
model distinguishes between two types of 
invalid syllogisms: (a) those where conver- 
sion transforms the syllogism into one with 
a different conclusion than would be pre- 
scribed by a logician—these are called DIF- 
FERENTS syllogisms; (b) those where con- 
version produces a syllogism requiring the 
same conclusion as would be prescribed by 
a logician (in this case, the appropriate con- 

clusion is “none of the above")— these 
problems are called SAMES-N syllogisms. As 
with valid Syllogisms, on DIFFERENTS 
problems, the reasoner will accept the con- 
clusion that matches the encoding of the 
problem and be scored as incorrect. For 
SAMES-N problems, the converted Syllogism 
requires a “none” conclusion, which the 
reasoner eschews. A property of such 
problems is that the reasoner will always fail 
to find an acceptable conclusion under the 
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time constraint of the present paradigm, 
Since sufficient time is not available, the 
reasoner is said to make a fair GUESS from 
among the conclusions provided. As a re- 
sult, a correct none response will occur on 
approximately 20% of the problems (when 
there are five alternatives to choose among, 
as in the present study). 

These and other predictions from the 
model were confirmed in other studies on 
both abstract (Revlis, 1975b) and concrete 
(Revlis, 1975a) problems. For example, on 
abstract problems, reasoners were highly 
accurate in their decisions on valid SAMES 
(72.8%) and highly inaccurate on valid DIF- 
FERENTS (12.0%). For invalid SAMES-N 
syllogisms, reasoners were accurate only 
15.396 of the time (close to the predicted 
20%), and they were only accurate 6.5% of 
the time on invalid DIFFERENTS. 


Current Issues 


Despite the general confirmation of the 
predictions of the conversion model, it is 
unclear whether these data support the 
underlying assumptions of the model, among 
which is the notion that the processes en- 
tailed in the encoding of the syllogistic 
premises are those normally used to under- 
stand quantified sentences, or specifically, 
that conversion of the premises is neither 
task specific nor a post hoc construct. One 
prediction is that reasoners should reach 
quite different decisions when they convert 
the premises than when they do not convert 
the premises. This blocking of conversion 
is predictable: It should occur in those in- 
stances where the pragmatic rules of English 
prevent a converted interpretation of a 
Sentence because it would be semantically 
deviant. A way of assessing this interpre- 
tation of conversion is to contrast reasoning 
accuracy when the premises have different 4 
Semantic content: When conversion is 
blocked, the reasoner will be more accurate 
on those problems where conversion nor- 
mally has a negative effect, that is, in valid 
and invalid DIFFERENTS problems. No 
change in reasoning accuracy should be 
shown on those problems where conversion 
normally has no effect, that is, in SAMES and 
SAMES-N problems. h 


SYLLOGISTIC REASONING 


Revlis (1975a) provides an informal test 
"of this assumption by showing that when 
universal affirmative sentences are given a 
set inclusion interpretation (e.g., AllAisa 

subset of B)—thereby blocking conver- 
sion—reasoning performance improves on 
_just those problems where conversion has its 
“effect: DIFFERENTS syllogisms. Unfortu- 
nately, the selection of the materials was 
based on the experimenter's intuitions, and 
no systematic attempt was made to appeal 
ito available norms to verify those intuitions 
eg., Battig & Montague, 1969). One pur- 
pose of the present study is to precisely test 
the predictions of the conversion model on 
concrete problems with materials whose in- 
"lerpretation is independently assessed. 
_ Asecond aspect of the conversion model 
isits implicit assumption that the conversion 
Operation is automatically applied in the 
comprehension of quantified relations. 
_ That is, the model claims that the conversion 
is not simply one of many ways in which a 
Mperson might disambiguate quantified 
statements; rather, it is the basic reading 
given to such statements. While there is no 
independent basis for this assumption, it 
satisfies certain model-theoretic consider- 
ations. If the conversion effect were to be 
the result of capricious encoding processes 
that vary idiosyncratically with subjects and 
conditions, we would be no better off than 
previous models in having to describe rather 
than predict the data. By specifying the 
M conditions for conversion beforehand, we 
maximize falsifiability if the predictions do 
not hold and maximize confidence in the 
model if they do. Because of this, the model 
makes the strongest claim possible: Con- 
Version is the obligatory encoding for all 
n quantified sentences, except in those con- 
crete conditions where it is blocked when 
Conversion leads to pragmatically deviant 
intences. 

While this assumption has face validity 
With concrete materials, one wonders how 
normal language processing functions are 
M to bear in the interpretation of ab- 
ti E symbolic premises. These are par- 
as arly important to understand, since 
E of the research on categorical rea- 

‘eens employs symbolic premises. Yet, we 
.sgárely encounter such expressions in our 
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everyday lives. The conversion model as- 
sumes that we process these expressions as 
we do the concrete ones. An alternative 
hypothesis is that symbolic materials are 
translated into concrete expressions in order 
to perform the reasoning task. There are 
many ways in which such translation might 
occur. First, reasoners might make a strict 
translation into a prototypic concrete form. 
If this concrete form permits conversion, we 
have no way of distinguishing between this 
variant of the hypothesis and the conversion 
model’s predictions. Second, reasoners 
might be unsure how to treat symbolic sen- 
tences and translate these into whatever 
concrete form is readily available. If this 
hypothesis is the case, then such translations 
might be related to the prior experience of 
the reasoner. In this condition, the con- 
version model would be underspecified in 
not including such a translation mechanism. 
Tt would also be in error on half of the prob- 
lems, since such translations should be made 
equally as often to convertible as to uncon- 
vertible (blocked expressions), assuming, of 
course, that such expressions occur equally 
often in the subject’s environment. 

While this hypothesis is contrary to the 
conversion model, it has interesting peda- 
gogic implications. Since it claims that the 
interpretation of symbolic expressions is 
conditioned by the available concrete ex- 
amples, one could eliminate conversion on 
symbolic premises by providing a series of 
concrete problems where conversion is nat- 
urally blocked—much in the way that a 
teacher might instruct a reasoner against 
converting the premises (see procedures 
by Dickstein, 1975; Simpson & Johnson, 
1966). The present study manipulates the 
immediate past experience of the reasoner 
on concrete syllogisms and tests for the 
presence of conversion on abstract, symbolic 
problems that the reasoner later encoun- 


ters. 
Method 


Subjects 


The subjects were 80 men and women volunteers 


from introductory psychology classes at a junior college. 
None of the subjects had been exposed to a course in 
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logic. They were run in groups of 20 in sessions lasting 
approximately 40 minutes. 


Procedure 


Each subject was randomly assigned to one of four 
content groups (n = 20), The students were told that 
they would be required to solve reasoning problems, 
where their goal was to decide which of five possible 
conclusions had to follow unambiguously from the given 
premises (see Syllogism 1), The subjects read the rules 
for solving such problems and were shown a sample 
problem that was not repeated in the experimental set. 
"The subjects were instructed to work each problem in 
the 30 sec allotted and to proceed to the next problem 
in their booklet only when told to do so. A 30-sec rest 
was permitted after the subjects solved the first 16 
problems. 


Materials 


Students were asked to solve 32 categorical syllo- 
gisms. Half of these problems had a valid conclusion 
and half did not. There were four unique syllogisms of 
each type, with four repetitions of each providing the 
full set of problems. The valid syllogisms, in the tra- 
ditional notational form, were EA-3, EI-1, EI-2, and 
AA-4. The invalid syllogisms were AI-2, OA-1, AE-1, 
and IE-1, Ordering of the problems within the booklets 
was randomized, with the restriction that runs of more 
than two instances of a validity type were not permitted. 
Once selected, the order of the problems remained the 
same for all test booklets, 

‘There were four different types of booklets (and four 
groups of subjects) corresponding to the content of the 
first 16 problems in each booklet. Group AB was asked 
to solve syllogisms that were expressed in the form of 
abstract sentences, where letters stood for class terms 
(e.g., All A are B; see Syllogism 4). 


4. All P are M. 
Some M are S. 
a. All S are P, 
b. NoS are P. 
c. Some S are P, 
d. Some S are not P, 
e. None of the above is proven. 


The three remaining groups solved concrete problems. 
The universal affirmative sentences used in these 
problems are shown in Table 1. The problems were 
constructed by having one universal assertion (see Table 
y ane one arbitrary relation as the other assertion 
onclusions expressed a relation with - i 
truth status. ee 
Group CN solved 16 concrete problems in whi. 

I which 
conversion would produce an acceptable interpretation 
because the universal sentence and its converse were 
both TUM A typical problem is illustrated by Syllo- 
gism 5. 
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5. All veterinarians are animal doctors. 
Some animal doctors are members of the Finvill; 
Club. 


Group IN solved 16 concrete syllogisms whose uni- 
versal sentence expressed an arbitrary inclusion relation 
on which there is no a priori reason to assume that 
reasoners would block conversion (see Syllogism 6). 


6. All products on the list are types of tools. 
Some types of tools in the store are imported 
objects. 


Group CB solved 16 syllogisms on which conversion 
would be blocked. These problems were constructed 
from Battig and Montague (1969) category norms, 
After selecting a predicate category for the universal 
sentence (e.g., tool), the subject category was selected 
to be approximately the twenty-first instance given to 
the predicate category (e.g., crowbar). In contrast with 
the other types of syllogisms, the reversal of subject and 
predicate class relations in Group CB syllogisms results | 
ina less likely relation (see Syllogism 7). 


7. All crowbars are tools. 
Some tools in the store are imported objects. 


The second set of 16 abstract syllogisms was the same | 
for all groups and had the identical structural form as 
the first set of 16 problems. By manipulating content 
and prior experience in this way, we are able to examine 
both the predictive accuracy of the conversion model 
as well as the plausibility of two of its underlying as- 
sumptions. 


Results 


The accuracy score of each reasoner was 
determined by summing the percentage 
correct for each problem type averaged over 
blocks (although each problem was repeated, 
no practice effect was observed). The data 
were then analyzed according to the speci- 
fications of the conversion model. The re- 
sults are presented below in two parts. The 
first part considers the importance of the 
encoding of inclusion relations for the solu- 
tion of categorical syllogisms. The second 
part examines the importance of the imme- 
diately prior concrete context for the solu- 
tion of abstract, symbolic syllogisms. 


Inclusion Relations 


The conversion model claims that con- 
version alters the nature of the syllogism, 50 
that the reasoner solves a problem that is 
substantially different from the one pre- 
sented. This transformation diminishes the 
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Table 1 
Universal Inclusion Relations 
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Convertible inclusion (Group CN) 
1. All totally blind people are sightless people. 


aone WNn 


— 
so o 


. All physicians are medical doctors. 

All toxic things are poisons. 

. All brunettes are brown-haired people. 

. All veterinarians are animal doctors. 

. All kin are relatives. 

. All trousers are pants. 

. All children are youngsters. 

. All types of money are kinds of currencies. 


10. All annual events are yearly events. 


Blocked inclusion (Group. CB) 
. All end tables are types of furniture. (20) 


T 

2. All pomegranates are fruits. (21) 
3. Allsuits are wearing apparel. (21) 
4. All bungalows are dwellings. (20) 
5. All crowbars are tools. (21) 
6. All second cousins are relatives. (22) 
7. All tweeds are fabrics. (23) 

: 8. All bayonets are types of weapons. (21) 
9. All snapdragons are flowers. (22) 


]l streetcars are types of vehicles. 


(22) 


10. Al 


Arbitrary inclusion (Group IN) 


= 
oeroecrwnr 


. All items on the list are types of furniture. 

. All objects on the counter are fruits. 

. All imported products in the back room are wearing apparel. 
. All Ace Company products are types of dwellings. 

. All products on the list are types of tools. 

. All handymen in Greensboro are relatives. 

. All products from Dave's factory are types of fabrics. 

. All devices on the list are types of weapons. 


9. All exotic things in Nancy's shop are types of flowers. 


of vehicles. 


10. All machines in the catalog are types 


Note, "Therank order of the subject category given as a response to the predicate category in Bal 


in parentheses. 


accuracy of DIFFERENTS syllogisms while it 
leaves SAMES unaffected and differentially 
* reduces accuracy on invalid syllogisms be- 
cause of guessing on SAMES-N problems. 
The model therefore directs us to look for an 
effect of problem type (i.e., SAMES more 
accurately solved than DIFFERENTS) and an 
effect of validity (valids more accurately 
Solved than invalids). 
That this model accounts for the data on 
abstract syllogisms can be seen from Table 
> 2, which shows that for the AB group, the 


ttig and Montague (1969) is shown 


i is greater than 
on DIFFERENTS syllogisms, F(1,19) = 24.5, 
« . This effect is larger for valid syl- 
logisms, T(38) = 22.5, p « .001, than for in- 
valid ones, 7 (38) = 1.5, P > .05, which con- 
tributes to a Validity X Problem Type in- 
teraction, F(1, 19) = 20.0, p € .001. Overall, 
valid syllogisms are easier than invalid ones, 
F(1,19) = 21.5, p € .001. 
The conversion model makes three testa- 
ble predictions concerning the reasoners’ 
accuracy on the three types of concrete 
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Table 2 
Percentage of Reasoning Accuracy on 
Problems with Different Categorical 


Expressions (Set 1) 


Valid Invalid 
syllogisms syllogisms 
DIF- DIF- 
Expressed FER- SAMES- FER- 
relation SAMES ENTS N ENTS 
Abstract 
(Group AB) 52.8 126 135 8.1 
Convertible 
(Group CN) 52.5 175 338 5.0 
Blocked (Group 
CB) 663 53.8 45.9 388 
Arbitrary inclusion 
(Group IN) 53.8 100 225 213 


problems included in the present study. 
The first claim of the model is that when the 
class inclusion is convertible, reasoners will 
show the same effect of conversion on con- 
crete problems as they do on abstract ones. 
In this case, the data for the CN and IN 
groups should be similar to those for the AB 
group. The reasoning accuracy of the con- 
crete CN group is presented in Table 2. 
which shows that, like the AB group, reaso- 
ners are more accurate on SAMES than on 
DIFFERENTS, F(1, 19) = 30.5, p < .001, 
and more accurate on valid problems than on 
invalid ones, F(1, 19) = 6.5, p < .05. No 
significant interaction between validity and 
problem type was observed. A direct com- 
parison between the AB and CN groups 
shows similar conversion effects (no signifi- 
cant Groups X Problem Type interaction 
was observed). The only difference between 
the two groups is that the CN group was 
more accurate in solving SAMES-N syllogisms 
than the AB group [Groups x Validity x 
Problem Type: F(1, 38) = 5.5, p < .05]. 
The present data support the general claim 
of the conversion model that conversion 
participates in the encoding of concrete as 
well as abstract materials, 

The comparison between the AB and CN 
groups is noteworthy in demonstrating no 
overall superiority of concrete syllogisms (see 
Wilkins, 1928). This supports the conten- 
tion of Revlis (1975a) that the effect of con- 
creteness is only seen when it clarifies the 
meaning to be given to the premises. When 
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the concrete materials are given the same 
interpretation as the abstract symbolic ma- 
terials (Groups CN and AB), no difference 
in reasoning accuracy should be found. 

The data from the CB group are presented 
in Table 2, which shows that as predicted (a) 
there is no significant conversion effect (ei- 
ther overall or separately) for valid and in- 
valid syllogisms and (b) there is a significant 
validity effect, F(1, 19) = 6.9, p < .05. 

A comparison between the two concrete 
groups (CN and CB) shows that students are 
more accurate on blocked than unblocked 
problems, F(1, 38) = 11.7, p <.01. As pre- 
dicted, this is due to a significant interaction 
between the type of relation (blocked-un- 
blocked) and the type of problem (SAMES- 
DIFFERENTS), F(1,38) = 3.7, p > .05. That 
is, the two groups differ on just those prob- 
lems that the conversion model predicts are 
affected by the presence of a conversion op- 
eration: DIFFERENTS syllogisms. 

The third claim of the conversion model 


is that conversion is blocked because of the | 


pragmatic implications of the converted 
encoding and is not a result of simply having 
an explicit class inclusion relation per se. 
This claim was tested by the IN group. Data 
for this group are presented in Table 2, 
which shows that arbitrary inclusion rela- 
tions exhibit similar response patterns as 
concrete conversion (Group CN) or abstract 
(Group AB) problems: (a) SAMES are solved 
with greater accuracy than DIFFERENTS, 
F(1, 19) = 18.9, p < .001; and (b) valid syl- 
logisms are easier to solve than invalid ones, 
F(1, 19) = 4.3, p = .05. However, the prob- 
lem-type effect is absent from the invalid 
syllogisms [Problem Type X Validity: F(1, 
19) = 30.3, p <.001]. A comparison of the 
CN and IN groups shows no main effect of 
groups nor any interaction between groups 
and problem type. A comparison of the two 
inclusion groups (CB and IN) shows that 
while students are more accurate in solving 
Group CB syllogisms than Group IN syllo- 
gisms, F(1, 38) = 18.7, p < .001, this effect is 
due to an interaction between relation and 
problem type [Group x Problem Type: F(1, 
38) = 3.1 p > .05] and a complex interaction 
among groups, problem type, and validity, 
F(1,38) = 7.4, p <.01. For valid syllogisms, 
the predicted interaction between groups 


| 
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and problem type is observed, F(1, 38) = 7.7, 
p« 0t. However, no significant interaction 
between groups and problem type is ob- 
served for invalid syllogisms. Here, Group 
CB syllogisms are easier to solve than Group 
IN syllogisms, F(1, 38) = 6.1, p < .05, and 
neither group shows à significant effect of 
conversion (i.e., problem type) for invalid 

_syllogisms. While this is as it should be for 
Group CB problems, the lower accuracy on 
Group IN problems (predicted by the con- 

E model) may have produced a re- 
gression effect working against an appre- 

- ciable difference in scores on SAMES-N and 

| DIFFERENTS. Note, however, that the 
major difference between the two groups is 

| on those problems where the conversion 

model predicts that the encoding of the 
premises is critical: valid DIFFERENTS 
syllogisms. 

The foregoing paired comparisons were 
motivated by specific predictions from the 
model and show it to be substantially correct 
in describing regularities in reasoners’ deci- 
sions, though valid syllogisms are more ac- 
curately described than invalid ones. In 
addition to these specific predictions, the 
model makes very general statements about 
the relative reasoning level among all four 
groups. For valid SAMES, the model is cor- 
rect in predicting that reasoners will show 
the same level of accuracy across all four 
groups, F(3, 76) = .82. (ns), although it is 
incorrect in claiming that the accuracy wil 
be 1009. The model correctly predicts that 
reasoning accuracy will not differ across the 
convertible groups (AB, CN, and IN) on 
pu. DIFFERENTS (096 correct is predicted), 

(2, 57) = .94 (ns); and that when a blocked 
group is included in the analysis (Group CB), 
UNS will be a difference across groups, © \o» 

6) = 22.8, p «.01. In contrast, the glob 

n predictions for invalid syllogisms are not 
supported by the data: (a) Invalid SAMES-N 
| syllogisms are not equivalent across con 


Te groups, F(2, 57) = 5.7, P < 01. 
ü ese deviations from the model’s predic- 
| nd on invalid syllogisms have been ex- 
| ranea earlier and do not provide a sub- 
E antial challenge to the underlying as- 

| hope of the model, though it is incum- 
ent on future elaborations of the model to 

H account for these findings. 
| 


Table 3 
Percentage of Reasoning Accuracy for Groups 
with Different Candidate Relations (Set 2) 


Valid Invalid 


DIF- DIF- 
Expressed FER- SAMES- FER- 
relation Sames ENTS N ENE 


Abstract (Group 
AB) 58.2 11.7 10.7 9.3 


Convertible 

(Group CN) 32.5 12.5 30.0 7.5 
Blocked (Group 

CB) 48.8 13.5 20.0 6.7 
Arbitrary 


inclusion 
3.3 


(Group IN) 42.5 1.3 16.3 i 


Taken together, these results lend support 
to the claims of the conversion model that (a) 
when a conversion operation is free to par- 
ticipate in the encoding of quantified rela- 
tions, reasoners exhibit the error pattern 
predicted by the model; and (b) when prag- 
matic factors block conversion at encoding, 
reasoners' accuracy does not differ among 


the types of problems. 


Translation of Content 


In order to assess whether students are 
sensitive to prior concrete instances when 
they solve abstract syllogisms, each of the 
ups was given the same second set of 
16 abstract syllogisms. Tf students’ rea- 


ceding experience, they sh t 
same pattern of decision errors when solving 


obligatorily entails conversion, then no dif- 
ference among the groups should be ob- 


served, and all groups should show the 
standard conversion e 


e. 
Eis translation hypothesis was evaluated 
s. First, a series of analyses was 
performed on the data for each group sepa- 
tain whether there was à con- 


these tests are presented in Table 3 and show 


622 


that a significant conversion effect was 
present for each group on the second set of 
16 problems: For Group AB, F(1, 19) = 
214, p < .001; for Group CN, F(1, 19) = 15.3, 
p <.001; for Group CB, F(1, 19) = 16.9, p < 
.001; and for Group IN, F(1, 10) = 26.0, p < 
.001. This is just what would be predicted 
if conversion were normally operating on 
symbolic materials. In addition, three 
groups show a significant validity effect, that 
is, valid syllogisms were easier to solve than 
invalid syllogisms: For Group AB, F(1, 19) 
= 17.9, p < .001; for Group CB, F(1, 10) = 
7.0, p € 05; and for Group IN, F(1, 19) = 4.2, 
p > .05 (Group CN was nonsignificant). 
The second aspect of the evaluation 
compared the groups as they were compared 
on the initial set of problems: Groups AB- 
CN, CN-CB, CN-IN, and CB-IN. In con- 
trast to the previous results, there were no 
significant differences between the pairs of 
groups nor were there any significant Group 
X Problem Type interactions for any of the 
comparisons. Only the Groups AB-CN 
comparison showed interactions between 
groups and other factors [Validity: F(1, 38) 
= 5.5, p € .05; Validity X Problem Type: 
F(1, 38) = 10.4, p < .001]. These effects 
result from the absence of a significant 
problem-type effect for Group AB invalid 
syllogisms. This is similar to performance 
on the first 16 abstract problems, where only 
a marginal conversion effect for invalid syl- 
logisms was found. This may be a conse- 
quence of the slightly lower accuracy on in- 
valid SAMES-N problems than was predicted 
by the model (20% predicted compared with 
13.5% for the first 16 problems and 10.7% for 
the second 16 problems). Clearly, students’ 
accuracy on invalid syllogisms poses a 
problem for empirical tests for the conver- 
sion model, since performance is predicted 
to be close to chance for SAMES-N syllogisms 
and approximately 0% for DIFFERENTS. 
Since the margin is only 2096, slight devia- 
tions from these predictions can result in no 
effect being observed. Despite this slight 
regression effect, these separate evaluations 
suggest that if reasoners are translating from 
abstract to concrete interpretations at the 
encoding of quantified sentences, the im- 
mediately preceding experience with the 
relations has no observable effect on logical 
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decisions under the conditions of the present 
experiment. 


Discussion 


The present study examined two aspects 
of the conversion model of formal inference, 
First, data support the model’s contention 
that conversion may be a normal part of the 
encoding of quantified category relations: 
The reasoner's accuracy in solving syllogisms 
varied with the kind of sentences presented 
on those problems where the conversion 
model claims that conversion has its effects, 
that is, with DIFFERENTS syllogisms. 

Second, the data lend qualified support to 
the argument that conversion is the basic 
reading given to sentences and is not simply 
one of many available encodings from which 
a reasoner may select. The reasoner’s en- 


coding of quantified sentences does not ap- | 


pear to be readily conditioned by the im- 
mediately preceding encoding. Had the 


} 


| 


preceding encoding predicted the later per- | 


formance, it would have been a sufficient 
condition for showing that conversion was 
available to the reasoner as an encoding op- 
tion rather than being obligatorily required. 
Of course, 16 problems may be insufficient 
to reveal subtle transfer effects. A more 
definite resolution of the matter will have to 
await further refinements of the paradigm. 


Predictive Accuracy of the Model 


: The foregoing demonstrates the conver- 
sion model’s ability to account for interesting 
regularities in the reasoner’s accuracy in 
solving categorical syllogisms. But what of 
its ability to accurately predict the precise 
decision that reasoners draw? To assess the 
predictive accuracy of the model on the first 
16 problems, we first calculated the per- 


centage of decisions that coincide with the + 


acceptance of each of the five conclusions. 
The results are given in Table 4, where the 
Starred responses represent the decision 
predicted by the model for each group.! 


! The conversion model accepts either of two conclu- 
sions as confirming in some instances but not in others: 
This is a result of the logician's special definition for the 
quantificational terms (see Revlis, 19752). 


t 
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Table 4 
* Percentage of Each Type of Response for Each Syllogism 
Ex- 
pressed 
rela- Response choice Response choice Correctly 
tion® 1 2 3 4 5 1 2 3 4 5 predicted 
Valid SAMES 
EI-1 EI-2 
[ AB 2.6 7.9 21.1 60.5* 19 0 19.4 16.7 412* 167 53.6 
CN RU 10.0 10.0 60.0* 20.0 0 15.0 15.0 45.0* 250 52.5 
CB 2.5 5.0 25 72.5* 17.5 0 10.5 15.8 63.2" 10.5 68.0 
2.5 15.0 15.0 55.0* 12.5 RU 15.0 22.5 52.5* 10.0 53.8 
4 Valid DIFFERENTS 
A EA-3 AA-4 
AB 0 60.5* 13.2 18.4* 19 81.6* 0 7.9* 2.6 19 84.2 
CN 2.5 80.0* 10.0 2.5* 5.0 41.5* 5.0 32.5* 5.0 10.0 81.3 
CB EU 19.4 16.7 41.7* 22.2 25.0 2.5 65.0* 0 7.5 53.4 
10.0* 12.5 77.5* 5.0 10.0* 0 15 82.5 
Invalid SAMES-N 
IE-1 OA-1 
AB 2.7 37.8 8.1 29.7 21.6 2.6 7.9 28.9 55.3 5.3 du 
CN 0 28.2 10.3 2.6 589 5.0 5.0 32.5 50.0 7.5 io 
l CB 2.5 12.5 20.0 12.5 52.5 EU 2.5 28.2 33.3 35.9 xr 
32.5 A) 2.5 52.5 32.5 12.5 t 
Invalid DIFFERENTS 
AE-1 AI-2 
g 0 0 81.6* 10.5 7.9 77.7 


0 82.5 
»* 


[he 53 ' 184 132 263 354 25 4 i i 
: o 525* 200 200 — 538 


respectively; I and O indicate 


. Note, In the standard notation, A and E indicate universa ] 
i ible orders of subject. and predicate terms, 


particular affirmative and negative sentences, respectively. Sin 


d by two letters and a number indicating the order (e.g. i ? 
stood for class terms and where conversion of sentences 


blems with concrete sentences, where conversion of 


"The AB group solved problems with abstract sentences, where letters 
or unacceptable (Group CB) implications. 


Table 4 reveals three characteristics ofthe dicted (SAMES should be 100%, DIFFER- 


present findings. First, where the modelcan ENTS, 0%, and SAMES-N, 20%). 
The present evaluation of the model has 


specify a decision for a syllogism (valid 
SAMES and valid and invalid DIFFERENTS), been in terms of aggregate group data. To 
the dominant response is the one predicted ain a better understanding of how repre- 
‘bythe model. Second, the model correctly sentative the group data are of individual 
predicts changes in the dominant response decision making, the model's accuracy ya 
4 as a function of the syllogisms’s content examined separately for individuals in a 
rather than its logical form (CB group). groups. 10 do this, we determined how 
Third, other principles than those included many subjects consistently responded in the 
in the model are necessary to account for all way that the model claims. This was done 
of the decisions observed, since approxi- for all groups on the first set of problems by 
mately one third of the decisions do not ac- characterizing an individual as promodel if 
cord with the model, and randomness does at least two of the four responses given for 
not appear to be a property of SAMES-N each problem type were im complete agree- 
problems. Notice too that the data do not ment with the model’s predictions. A ai 
^ totally reflect the levels of accuracy PI€- model person on each problem type wou 
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Table 5 
Percentage of Promodel Reasoners 
Valid Invalid 
DIF- DIF- 
Expressed Valid FER- FER- Over- 
relation SAMES ENTS ENTS all 
tract (Grou 
AM yo 65 95 95 60 (95) 
Convertible 
(Group CN) 45 100 60. 45 (95) 
Blocked (Group 
CB) 85 100 — 85 (85) 
Arbitrary 
inclusion 
(Group IN) 65 90 40 40 (75) 


Note. Numbers in parentheses result from using the ,05 level 
of significance. 


therefore agree with the model on at least 6 
of 12 critical problems. This criterion cor- 
responds to approximately the .01 level of 
significance across problems. 

Table 5 presents the results of this anal- 
ysis and shows that most, subjects in each 
group met the criterion for some problem 
types, and many subjects in each group met 
the criterion across all problem types (58% 
if the .01 level is used for the 12 problems or 
88% if the .05 level is used). Clearly, the 
model’s accuracy is supported both by the 
individual and the aggregate data. 

These findings have pedagogic implica- 
tions for efforts at explaining and improving 
reasoners’ accuracy on logical reasoning 
tasks. If we grant the plausibility of the 
conversion model, it is clear why previous 
efforts at training students against encoding 
sentences with a converted meaning were 
equivocal at best (e.g., Dickstein, 1975; 
Simpson & Johnson, 1966). First, conver- 

sion may be a natural part of the encoding of 
English sentences, and as such, simple in- 
structions-by-example may not be effective 
in blocking conversion, at least under the 
type of conditions employed here. Second, 
the effect of instructions against conversion 
has been poorly evaluated in the past, since 
investigators looked for a dramatic effect of 
instructions by averaging performance over 
all problems. In contrast, the conversion 
model claims that the effect should be lo- 
calized to a specific set of valid and invalid 
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syllogisms (i.e., DIFFERENTS syllogisms), 

Indeed, the model not only specifies the type 
of problems where the effect should be lo- 

calized, it also suggests the range of materials 

that might be useful in pointing out the im- 

portance of the student’s encoding of the 

syllogistic premises. It is expected that 
further research on this model may have 
something to say about the use of such syl- 
logisms on aptitude and achievement 
tests. 


Conclusions 


The present study evaluates the viability 
of the conversion model as an explanation for 
the decisions reached on concrete and ab- 
stract syllogisms. The model asserts that 
the decisions rationally follow from the 
reasoners’ understanding of the materials 
and that this understanding is of a piece with 
the way students normally encode quantified 
English sentences. 

Support for this model and its underlying 
assumptions is provided both by its ability 
to account for previously observed patterns 
on reasoners’ decisions (the effects of con- 
creteness and validity), by its ability to pre- 
dict the absence of a transfer from concrete 
to symbolic problems, and by its modest 
ability to accurately predict the decisions 
reasoners will reach on a critical set of syl- 
logisms. 

Taken together, these findings support the 
usefulness of further examinations of the 
conversion model as a model of formal rea- 
soning and as a guide for instructional eval- 
uation. 


Reference Note 


1, Erickson, J.R. Memory search in formal syllogistic 
reasoning. Paper presented at the meeting of the 
Midwestern Psychological Association, May 1972. $ 
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Depth of Processing and Interference Effects in the | 


Learning and Remembering of Sentences 


Janet Hidde Kane 
Queens College of the CUNY 


In two experiments, subjects who completed the last words of sentences they 
read learned more than subjects who simply read whole sentences. This facil- 
itation was observed even with a list of sentences that were almost always com- 
pleted with the wrong words. However, proactive interference attributable | 
to acquisition errors appeared on recall and recognition tests administered 


after a 1-week interval. 


On a wide range of verbal tasks—includ- 
ing word lists (Hyde & Jenkins, 1969), sen- 
tences defining unfamilar words (Anderson 
& Kulhavy, 1972), and prose passages 
(Schallert, 1976) —performance is strongly 
facilitated by diverse procedures that. would 
appear to have in common only that subjects 
are caused to give meaningful representa- 
tions to the words. This has come to be 
known as the depth-of-processing effect 
(Craik & Lockhart, 1972). One study from 
the genre will be detailed because it involved 
the same paradigm as the present research. 
Anderson, Goldberg, and Hidde (1971) 
prepared sentences such that in each the last 
word was semantically determined by the 
rest of the sentence, for instance, Elevators 
stop at every floor, Subjects who filled 
blanks in place of the last words of sentences 
they read aloud learned significantly more 
than subjects who read aloud whole sen- 
tences. The explanation for this result is 
that completing a sentence forces a person 
to process the other words meaningfully, 
whereas a person can "read," that is, decode 
into speech, a whole sentence without com- 
prehending it. The investigators said, 


Consider the incomplete statement, Elevators stop at 
every —. To complete the sentence with the word 
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floor requires a person to bring to mind, in however 
fleeting a form, a meaningful representation of the rest 
ofthesentence. Simply translating the printed words 
into speech will not suffice, because the mere sound of 
the other words cannot evoke floor. Floor is semanti- 
cally rather than acoustically related to the rest of the 
sentence. (p. 396) 


The idea of depth of processing now enjoys 
wide currency in education. One technique ` 
to make more likely “deep” processing of 
text material is to ask the student thought- 
provoking questions (Anderson & Biddle, 
1975). Studies demonstrate that what is 
learned from a text is influenced by the ways 
in which the learner must study or elaborate 
the passage in order to answer the accom- 
panying questions. For example,.readers 
who receive questions that require applying 
a principle to new examples perform better | 
on a subsequent test than readers asked 
otherwise identical questions that require 
applying the principle to the examples used 
as illustrations in the text (Watts & Ander- 
son, 1971). Similarly, people asked para- 
phrased questions remember more than 
people given questions that repeat sentences 
verbatim (Andre & Sola, 1976). Questions 
that involve application to new examples, 
paraphrase, or inferences that go beyond the ; 
text can be argued to require deeper pro- 
cessing. But, unfortunately, these sorts of 
questions are more difficult than verbatim 
questions. There is a lower probability that 
students will answer them correctly. 

à The issue the present research addressed 
is whether engaging in a task that appears to 
increase the likelihood of meaningful pro- 
cessing will be facilitative when the task also i 
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gives rise to frequent errors. Pairs of sen- 
tences containing the same subject noun and 
Jast word were constructed. When given a 
“determined” sentence stem, subjects con- 
sistently supplied the same last word to 
complete the sentence. For example, all 
subjects responded desk to complete this 
stem: The executive sat behind his large oak 
—. When presented the companion “un- 
determined” stem, The executive went to 
shop for a new —, the subjects supplied 
any different words, including tie, car, suit, 
E^ and pen. No one produced 
esh. 

Subjects first supplied a word to complete 
asentence and then were shown the sentence 
with the word the experimenter had chosen 
to complete the sentence. They were told to 
read the sentence aloud, trying to guess the 
correct word, and then to learn the experi- 
menter's version of the sentence. Control 
subjects simply read the sentences. The 
| sentence completion task was expected to 

improve the learning of determined sen- 
tences, as it had in the previous studies, since 
meaningful processing is assured. However, 
when the sentences were undetermined, the 
sentence completion task was expected to 
disrupt learning of the experimenter's ver- 
sion. Subjects will almost never complete 
these sentences with the word intended by 
the experimenter. 'The wrong answers 
should interfere with learning the correct 
versions. 


Experiment 1 


Een Ninety-six undergraduate students en- 
SER man introductory educational psychology course 
te Paparoa in this study to fulfill part of the course 
» namente, The subjects were randomly assigned 
the mena conditions at the time of testing, with 
a estriction that all cells of the design include the 
me number of subjects before another subject was 
added to any cell. 
oen The two main factors in the experiment 
task experimental task and list type. Experimental 
ais defined by two levels: In the reading-only 
it kaa subjects saw the completed sentence and read 
donde and in the sentence completion condition they 
us e sentence with a blank in place of the last word 
b upplied a word to complete the sentence. List 
a Pe had three levels: determined, undetermined, and 
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mixed. The determined lists were made of sentences 
that were constructed so all subjects would report the 
same last word to complete the sentence, while sen- 
tences on the undetermined list prompted a variety of 
final words. The mixed lists included both types of 
sentences. Both experimental task and list type were 
between-subjects factors. 

Each subject studied two lists. The order of lists was 
counterbalanced within each treatment condition. 
Study of each list was followed by two tests. The for- 
ward test presented the subject noun and required the 
subject to reply with the last word of the sentence, and 
the backward test presented the last word of the sen- 
tence and asked the subject to report the sentence’s 
subject noun. The order of the two tests was the same 
across both lists and was counterbalanced between 
subjects. The backward test was included as a check 
on the results of the forward test. If the sentence 
completion group scored higher on a forward test than 
the reading-only group, it might be proposed that the 
sentence completion group benefited from an uninter- 
esting form of positive transfer from the study task to 
the test, since the two activities entail similar response 
demands in the sentence completion condition. If this 


most credible interpretation would be that the sentence 
completion group learned more sentences. 

Materials. The determined sentences were chosen 
from sentences used in the earlier study (Anderson, 
Goldberg, & Hidde, 1971). In that study, undergrad- 
uates presented with sentences that had a blank in place 
of the last word were instructed to complete the sen- 
tence with the word that “most obviously fit the 
meaning of the sentence." A sentence was considered 
determined if 9796—10096 of the norming sample used the 
same word to complete the sentence. 

For the study reported here, a set of undetermined 
sentences was created. These sentences used the same 
subject noun — last word pairs as the determined sen- 

i bedded in a different sen- 
so the last word could not be predicted 
from the first part ofthesentence. In order to verify 
that the last word was indeterminate, the sentences 
were given to 51 students enrolled in an introductory 
educational psychology class. Each sentence had a 


blank in place of the last word, and the students were 


instructed to fill in a word that sensibly completed the 


sentence. From among the 63 sentences normed, sen- 
tences were selected according to the criterion that no 
more than 50% of the norming sample fill any blank with 
the same word. ‘The average proportion with which the 


correct word was supplied was T 
determined sentences used in the experiment. 
Two more examples of sentence pairs follow. The 


determined sentence is listed first. 


tence context 


The dove is a symbol of peace. — f 
The dove appeared when the magician said peace. 


The physician noted the time on his wristwatch. 
The physician asked the patient if he had a watch. 


The undetermined sentences were arranged in two 
lists of 24 with care to minimize intralist similarity. 
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sentence completion experiments are only 
superficially different from a paired-asso- 
ciate experiment within the proactive para- 
digm. In any case, it would be hard to dis- 
suade us from trying to milk the analogy on 
the grounds that the correspondence is less 
than perfect because we have previously 
found analogies with paired-associate in- 
terference paradigms quite fruitful (An- 
derson & Myrow, 1971; Kulhavy & Ander- 
son, 1972; Surber & Anderson, 1975). 

How could the failure to find negative 
transfer in the first experiment be explained 
within the framework of interference theory? 
A plausible answer is that it was easy for a 
subject to differentiate between his word and 
the correct word. He produces his word and, 
in contrast to the correct word, it never ap- 
pears in print. Ifthe subject can recall both 
his word and the experimenter’s word and 
can accurately distinguish his response from 
the experimenter’s, little negative transfer 
would be evident. 

The words the subject produces himself 
are still a potential source of interference, 
however, that might manifest itself under 
some conditions. One such condition is 
delayed retention. Since both the deter- 
mined and the undetermined sentence 
groups must learn the experimenter’s sen- 
tence, the task for the former group is A-C, 
A-C, recall A-C and the task for the latter 
group is A-B, A-C, recall A-C, This ar- 
rangement corresponds to the classical 
proactive inhibition paradigm. Recall per- 
formance in the undetermined condition 
Should suffer from interference from A-B. 
Proactive interference effects increase with 
the length of the interval between learning 
A-C and recalling A-C. It can be argued 
that these effects were not apparent on the 
immediate recall test used in Experiment 1 
because of the very short interval. Experi- 
ment 2 tested the hypothesis that the un- 
determined sentence completion group 
would score lower on a delayed retention test 
than the determined Sentence completion 
group because of interference from the words 
the subjects supplied during the study trial. 
A specific hypothesis about the reason for 
the predicted interference after a delay is 
response competition, caused by loss of dif- 
ferentiability of the Subject's words and the 
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correct words. Another specific hypothesis 
about the reason for interference is discussed’ 
later. 


Method 


Subjects. Forty-four undergraduate students en. 
rolled in an introductory educational psychology course 
participated in this study to fulfill part of the course 
requirements. Two other students were eliminated 
from the sample because they did not return for the 
retention test, and three others were eliminated because 
they did not follow the directions. Subjects were ran- 
domly assigned to experimental treatments when they. 
arrived for the experiment. 

Design. The study used a 2 X 2 analysis of variance 
design with factors of experimental task and sentence 
type. As in Experiment 1, there were sentence com- 
pletion and reading-only groups. The factor of sen- 
tence type included two levels: determined and un- 
determined sentences. 

Materials. From the 48 undetermined sentences 
used in Experiment 1, 30 were selected to minimize 
similarities among subject noun — last word pairs. (For 
example, one sentence used the word child as the 
subject, and another used children as the subject, so one 


of these was eliminated.) For the undetermined sen- J 


tences selected, the average proportion of norming 
group subjects who supplied the experimenter's word 
to complete the sentence was .05. The set of deter- 
mined sentences consisted of the same subject noun- 
last word pairs, but the set used a different context so 
the last word was consistently predictable from the 
sentence stem. 

For the immediate and the delayed recall tests, the 
subject noun from each sentence was presented on à 
Separate index card and subjects were instructed to 
report the last word of the corresponding sentence. 
Since the determined and the undetermined sentences 
Were constructed from the same set of subject noun - 
last, word pairs, the test items were identical for all 
groups and the response scored as correct for each 
subject noun was the same for all groups. The order of 
test items was random, with the restriction on the im- 
mediate test that the first half of the test include only 
items from the first half of the study list. 

For the delayed recognition test, the subject noun of 
a sentence was presented on an index card along with 
three alternative responses. ‘The alternatives includ 
(a) the correct response, (b) a correct response to an- 
other item, (c) the most frequently reported incorrect 
response for the undetermined form of the sentence, 
and (d) the most frequently reported incorrect response 
for the undetermined form of another randomly chosen 
sentence. A unique form of the delayed recognition test 
was created for each subject who received undetermin 
Sentences and the sentence completion task. Alter- 
native c was replaced with the word the subject had 


reported during the study interval for the corresponding | 


item. This procedure was necessary because interfer- 


ence effects are not evident on a recognition test uc 
the particular competing responses are among the 


ternatives (Anderson & Watts, 1971). The alternatives 


| 
i 
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were arranged in a random order for each test item. 
The order of items within the test was identical for all 


subjects. 
Procedure. Each subject completed two practice 


items before studying the set of 30 sentences. The 
sentences were presented one at a time in the window 
ofa memory drum. Subjects in the reading-only groups 
saw the entire sentence for 8 sec, while subjects in the 
sentence completion group saw the sentence with a 
blank in place of the last word for 4 sec and then the 
completed sentence for 4 sec. 

In the sentence completion groups, subjects read the 
entire sentence stem aloud and tried to guess the word 
the experimenter had chosen to complete the sentence. 

b^ experimenter recorded the subject's response. 
After 4 sec, the completed sentence was presented for 
another 4 sec, and the subject read the entire sentence 
aloud. 

In the reading-only groups, the entire sentence ap- 
peared in the window of the memory drum for 8 sec, 
during which each subject read it aloud. 

After the sentences had been presented, the subject 

completed the immediate recall test. When finished, 
subjects were asked to return at the same time a week 
later for another experiment. Some subjects asked if 
the experiment would cover the same material. They 
were told the procedures would be similar but not 
identical. The night before the delayed tests, each 
subject was called to remind him of his appointment. 


Results and Discussion 


Immediate recall. A 2 X 2 analysis of 
variance of immediate recall scores identified 

a significant main effect for experimental 
task, F(1, 40) = 13.35, p < .01, but no effect 
for sentence type and no significant inter- 
action. "he mean proportions are presented 
in Table 2. The results on the immediate 

* recall test replicate the findings of Experi- 
ment 1. When subjects supply a word to 
complete a sentence, learning is facilitated, 
regardless of the match between the subject's 
word and the experimenter’s word. The 
absence of an effect for sentence type 
p that both lists were equally learn- 
Subjects in the undetermined sentence 

X Completion group were expected to complete 
5 sentences with words other than those 

E osen by the experimenter. This did not 
ccur for each item, however. Sometimes, 
Subjects did not report any word durin; the 
study trial. The mean proportion, P of 
Sha in which this happened was .15. Oc- 
asionally, the subject gave the correct word 
Eg .09). Competition would be possible 
os ly on those items for which the subjects 
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Table 2 
Mean Proportion Correct on Each Test in 


Experiment 2 


Experimental Immedi- Delayed 
task and ate Delayed recogni- 

sentence type recall recall tion 
Sentence completion 

Determined 84 50 88 

Undetermined 83 40 m 
Reading only 

Determined .69 .26 79. 

Undetermined 65 .28 416 


Undetermined ^ 79 Z. 


reported an incorrect word during the study 
trial. Subjects in the determined sentence 
completion group were expected to fill the 
blank with the word chosen by the experi- 
menter, but occasionally they suggested a 
different word during the study trial (P- 
.02). 

For the undetermined sentence comple- 
tion group, the conditional probability of 
reporting a correct response (R2) on the 
immediate test, given that a wrong response 
(W1) was reported during the study trial, 
P (R2|W1), was computed. This was com- 
pared with the conditional probability of 
reporting a correct answer on the immediate 
test given that the correct word was reported 
during the study trial, P (R2|R1), for the 
determined sentence completion group. _ If 
supplying different words results in negative 
transfer to the task of learning the experi- 
menter’s sentence, the undetermined sen- 
should recall fewer 
of the items that fit the interference para- 
digm than subjects in the determined sen- 
tence completion group. 
was .85 and P (R2|R1) was .83, so there was 
actually a slight trend in the direction of 
positive transfer. ; 

Delayed recall. The analysis of variance 
for this set of scores also shows a significant 
main effect for the experimental task, F(1, 
40) = 13.18, p < .01, but no significant effect 
for sentence type or the interaction. The 
task requiring subjects to comprehend the 
sentence results in higher retention test 
scores than the reading-only control group 
after a 1-week retention interval. , 

The most sensitive test for proactive in- 
hibition includes just those cases in which 
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the specific conditions required for inter- 
ference are present. Within the undeter- 
mined sentence completion group, the con- 
ditional probability of correct recall on the 
delayed test (R3), given that a wrong 
response was supplied during the study trial 
and the correct response was given on the 
immediate recall test, was computed for each 
subject. This conditional probability, 
P (R3|WIR2), reflects just the set of cir- 
cumstances that define proactive inhibition. 
The subject reported A-B during the study 
trial and had learned A-C as evidenced by 
his performance on the immediate test. If 
the subject did not correctly answer the item 
on the immediate test or if he matched the 
experimenter's word during the study trial, 
the specific conditions for proactive inhibi. 
tion were not met. For the determined 
sentence completion group, the conditional 
probability of correct delayed recall, given 
that the correct response was reported dur- 
ing the study trial and a correct response was 
given on the immediate test, was computed 
for each subject. This value, P (R3|R1R2), 
represents the conditions in which no 
proactive inhibition is expected and serves 
às a standard of comparison for the perfor- 
mance of the undetermined sentence com- 
pletion group. 

If the words the subject supplied during 
the study trial serve as a source of interfer- 
ence for later recall of the correct response, 
then P (R3|W1R2) < P (R3|R1R2). This 
prediction was confirmed. The values 
computed in the manner just explained were 
44 for the undetermined sentence comple- 
tion group and .58 for the determined sen- 
cant diane EOD which is a signifi- 
cant difference, ¢(20) = 1, 
one-tailed test, oa bra 

To further document the effect of inter- 
ference from words reported during the 
study trial, the errors on the retention test 
were itemized. Of the overt errors, 29% were 

words supplied during the study interval. 
This averaged to 1.45 obviously interfering 
items per subject. 

Delayed recognition. On the delayed 
recognition test, subjects were presented 
with the subject noun of a sentence plus 
three distractors. One of the distractors for 
subjects in the undetermined sentence 
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completion group was the specific word the 
had supplied to complete the sentence du 
ing the study trial For the determing 
sentence group, this distractor was the wor 
most frequently supplied to the parall: 
undetermined sentence. The analysis 0 
variance for these scores shows no significan 
main effects, but the Experimental Task 
Sentence Type interaction was significant 
F(1, 40) = 6.32, p <.05. A further compar 
ison showed, as predicted, significantly 
poorer performance in the undetermined 
sentence completion group than in the de. 
termined sentence completion group, t (20) 
= 2.80, p < .05. 

Conditional probabilities of recognition 
were compared for the particular interfering 
items in the undetermined sentence com: 
pletion group and the particular noninter- 
fering items in the determined sentence 
completion group. This analysis was iden- 
tical to the one done with delayed recall. The 
determined sentence group recognized à 
larger proportion of the items (P = .92) than 
the undetermined sentence group (P = .8)) 
and this difference was statistically signifi- 
cant with ¢(20) = 1.78, p < .05, by a one: 
tailed test. An analysis of errors indicated 
that 93% made by the undetermined sen: 
tence group were choices of the words re- 
ported during the study trial, a fact very 
consistent with the response competition 
interpretation. 


General Discussion 


Both experiments demonstrated that 
when subjects provide the last word t0 
complete each of a series of sentences, they 
learn more than subjects who simply read 
whole sentences. This facilitation occurs 
regardless of the match between the terms, 
supplied by the subject and the ones desig 
nated as correct by the experimenter. 0, 
other words, neither experiment gave evi" 
dence of negative transfer in the condition 
in which the correct last word could not be 
predicted and subjects were almost always 
Wrong. However, Experiment 2 showed that 
errors during acquisitions have disruptiv? 
consequences for retention after one week 
Proactive interference from the nonmate 
ing words supplied during the study tri m 
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j affected both delayed recall and delayed 
recognition in the undetermined sentence 
completion group. 
| The results of the two experiments par- 
į allel findings from research with paired as- 
| sociates. The task of the undetermined 
| sentence group can be represented as A-B, 
A-C, recall A-C, while the task of the de- 
I termined sentence completion group can be 
- represented as A-C, A-C, recall A-C. The 
| contrast between these two groups resembles 
k the paradigm for demonstrating proactive 
inhibition in a list experiment, in which re- 
sponses learned on the first list compete with 
recall of responses learned on the second list. 
This model from paired-associate research 
- accurately predicted relative performance on 
the delayed retention tests. 

There is an aspect of the present results 
that is unexpected from the perspective of 
interference theory and paired-associate 
research. The undetermined sentence 
completion group did as well as the deter- 
mined sentence completion group on the 
immediate test but worse on the delayed 
test, Usually negative transfer and proac- 
tive inhibition are symmetrical; that is, you 

| don’t find evidence for one without the 
' Other. This is so generally true that the two 
effects are often combined under the same 
label. Clearly, however, transfer must be 
distinguished from proaction to account for 
the present case. We believe that our results 
s are best explained in terms of the concept of 
diff erentiation. The argument is that 
negative transfer did not occur because 
during the study trial and immediate test 
Subjects in the undetermined sentence 
completion group were easily able to distin- 
p^ between the incorrect responses they 
i ad given and the experimenter's intended 
Tesponses. This is a very plausible expla- 
" nation, we maintain, since there are a num- 
er of obvious cues available that would 
M an individual to discriminate a word 
€ or she had uttered from a printed word. 
Een our argument goes, subjects were less 
e to distinguish their words from the ex- 
Perimenter’s words. Hence, there was a 
*crement in performance attributable to 
Proactive interference. 
There is a precedent in the paired-asso- 


i 


“ciate literature for invoking the concept of 


differentiation to explain proaction (Keppel, 
1968; Postman & Underwood, 1973). Any 
manipulation that increases the discrimi- 
nability of potentially interfering items at- 
tenuates proactive effects. These manipu- 
lations usually have been applied to whole 
lists, so the factor presumed to underlie the 
effect traditionally has been called list dif- 
ferentiation. However, this phrase would 
not be very apt were it applied to our sen- 
tence completion task, since this task doesn’t 
actually involve separate lists. Hence, we 
are using the simpler expression “differen- 
tiation.” While differentiation is part of the 
conceptual arsenal of interference theory, it 
plays a larger role in the foregoing account 
than in most treatments. To mark this 
emphasis, we shall call our explanation the 
differentiation-interference hypothesis. 

There is another explanation within the 
framework of interference theory that might 
be proposed to account for the fact that the 
undetermined sentence completion group 
did well on the immediate test but relatively 
poorly on the delayed test. According to this 
hypothesis, (a) A-B connections are “un- 
learned” or “extinguished” during the 
learning of A-C connections; (b) therefore, 
the B responses are unavailable during an 
immediate test to interfere with the recall of 
the C terms; (c) however, the B terms 
“spontaneously recover” during the reten- 
tion interval; (d) and hence, the B terms are 
available to compete with the C terms on a 
delayed test. This can be called the avail- 
ability-interference hypothesis. ]t seems a 
less likely explanation to us, for it is some- 
what incongruous to maintain that the initial 
responses of the subjects in the two experi- 
ments reported here could have been com- 
pletely extinguished in a single trial. The 
notion of extinction seems to require re- 
peated evocations of a response with unsat- 
isfactory results before the response is 
eventually “ynlearned.” The hypothesis 
appears more tenable if one moves away 
from the classical conditioning paradigm 
from which these concepts initially were 
borrowed and says that learning A-C “sup- 
presses" A-B or causes B to become “un- 
available." There is considerable support 
for an explanation along these lines (cf. Un- 
derwood, 1957). 
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The experiments reported in this article 
do not permit a choice between the differ- 
entiation and availability explanations. The 
important point that this research does es- 
tablish is that whatever the details of the 
process, people are likely in the long run to 
suffer from interference attributable to 
wrong answers initially given when an- 
swering questions. 

The task in the undetermined sentence 

completion condition resembles the in- 
structional situation in which a student is 
presented with a question and answers it 
incorrectly. Results here suggest that if the 
student is then provided with feedback that 
specifies the correct answer he or she will be 
able to learn that answer, but both the stu- 
dent’s wrong answer and the correct answer 
will compete on a retention test. One way 
to avoid or minimize this interference would 
be to prevent errors by carefully structuring 
the questions within a precise instructional 
sequence. Another way to minimize inter- 
ference effects would be to provide further 
practice with the question and the correct 
response any time the student answers a 
question incorrectly. The Distar Reading 
Program (see Siegel, 1976) includes such an 
error correction procedure. When a child or 
group of children respond incorrectly to a 
question, the teacher is instructed to give the 
correct response and then to repeat the 
question and have the students supply the 
answer. Siegel (1976) showed that teachers 
who consistently used this sequence had 
classes who scored higher on unit achieve- 
ment tests than teachers who did not con- 
sistently use this correction Paradigm. In 
addition, when the less effective teachers 
were trained in the use of the correction se- 
quence, their classes subsequently scored 
higher on an achievement test. than classes 
of matched, untrained teachers. 

We have invoked the concept of depth of 
processing as the most plausible explanation 
now available for why a task such as com- 
pleting a sentence is more facilitative than 
reading the whole sentence, However, there 
are other possible explanations. For in- 
stance, it might be supposed that the factor 
underlying the effectiveness of the comple- 
tion task is that it entails more rehearsal of 
asentence. According to this view, getting 
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the meaning from sentences involves holding 
sentence constituents in memory and 
working on them longer or more often than 
would be required simply for pronouncing 
them. The fact that a task involves an 
analysis of meaning would be seen as a co- 
incidental rather than a directly causative 
factor from this perspective. There are no 
reasons internal to the experiments reported 
here for preferring a depth-of-processing 
explanation over an amount-of-rehearsal 
explanation. It should be stressed, though, 
that the interpretation of the results of the 
present studies does not hinge in any crucial 
way on the underlying reasons for the ef- 
fectiveness of tasks that require semantic 
analysis. 

On the delayed recall test one week after | 
learning, even the undetermined sentence 
completion group recalled more than its 
reading-only control. It is tempting to 
venture the educational implication that the 
advantage to be gained from tasks requiring 
the student to construct meaningful repre- 
sentations for verbal material outweighs any 
performance decrement due to interference 
arising from errors during learning, but we 
shrink from pushing this implication until 
studies are completed using a wide variety of 
materials and a number of different reten- 
tion intervals. 
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Effects of a packaged teacher-consultant-mediated intervention program 
(CLASS) for modifying acting-out behavior in the regular classroom were 
evaluated in two experiments. Consultants were instructed (a) as part of an 
8-week course by two briefly trained college instructors or (b) in a 2-4-day 
workshop by the program's developers. Fifty-four primary-grade experimen- 


tal and control children from three school districts were involved. The results 


indicated the experimentals, in contrast to the controls, significantly in- 
creased their proportion of appropriate behavior postintervention and in the 
next academic year (Experiment 2) and required fewer remedial services and | 


special class placement up to 3 years later. 


The program's external generali- 


zability and cost-effective service-delivery strategy are discussed. 


In the last decade, the impact of behav- 


ior modification procedures upon the 


learning and behavioral problems of children 
in the educational setting has been truly 
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impressive. The range of successful appli- 
cations of this powerful technology has been 
noted across differing populations, handi- 
capping conditions, and classroom settings | 
(see reviews by Birnbrauer, 1976; Drabman, 
1976; O’Leary & O'Leary, 1976). Recent 
pressures for accountability and, concomi- 
tantly, the mainstreaming of all handicapped 
children with the passage of Public Law 
94-142 have sparked increased interest in the 
systematic and widespread application of 
behavioral procedures, However, to achieve 
these goals, behavioral technology may have 
to be “packaged” for implementation by 
relatively unsophisticated personnel in . 
School settings. Further, such packaged § 
programs should contain not only the basic 
intervention procedures necessary for ad- i 
dressing children's learning and behavioral 
problems but also procedures for training 
School personnel in their correct imple- 
mentation (Tyler, 1978). 

Two such packaged programs have been 
Successfully developed for the remediation 
of the behaviors of acting-out aggressive 
children in the academic setting (Kent 
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O'Leary, 1976; Patterson, Reid, Jones, & 
Conger, 1975). However, the Patterson et 
al. (1975). package is primarily a home in- 
tervention program with a supplemental 


component for children who are also dis- i 


tuptive in the educational setting. The 
Kent and O’Leary (1976) program, designed 
for the school setting, also has a major home 

j component, which limits its applicability to 
children of parents who are willing to make 
amajor investment of time and energy in the 
program. Neither program has available a 
set of systematic low-cost procedures for 
training personnel in the implementation of 
the programs; for example, 42 hours were 
required to train BA-level therapists to work 
as psychological assistants to PhDs and 

| jointly implement the procedures (Kent & 
O'Leary, 1977). 

To be practically useful, a packaged pro- 
gram should be demonstrated to be effective 
outside the limited research context in which 

. much of the behavioral research has been 
‘conducted to date (Kent & O’Leary, 1976). 
Th most instances, experimenters play major 
toles, regularly monitoring and providing 
feedback to school personnel and thereby 
ensuring that the procedures are imple- 
mented as intended. Although this certifies 
an accurate test of the procedures in their 
development, it may not test their effec- 
tiveness under more natural nonexperi- 
menter conditions. Few programs have 
been implemented and evaluated by per- 
sonnel working directly in the referral setting 
With a minimum of experimenter monitor- 
ing; for example, Kent and O'Leary (1976) 
developed their program “in consultation 
with the therapists," leaving open the 
E en element claimed in their re- 

í — À serious limitation of numerous studies 
is the absence of follow-up data indicating 
maintenance of treatment effects. Patter- 
m" (1974) presented observation data for 
WO groups of subjects at 1-3 months (n = 6) 
and at 4-6+ months (n = 5) following ter- 
Mination of treatment. No decline in ap- 
Inopriate-behavior scores was noted for ei- 
a er group. Durability of treatment effects 
ee also by Kent and O'Leary (1976) 
Hea eir experimental group 9 months fol- 
«owing treatment. However, a significant 
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decrease in the deviant-behavior scores of 
their control group eliminated any experi- 
mental-control group differences. As a re- 
sult of these findings, Kent and O'Leary 
(1976) argued that control groups must be an 
essential feature in any design attempting to 
demonstrate long-term effects of treatment. 
Greenwood, Hops, and Walker (1977) also 
presented data indicating that control 
groups may be necessary to account for such 
rival independent variables as setting 
changes, curricula, and so forth. 

Some researchers have increased the 
power of their experiments by including 
comparisons among deviant and normal 
peers in the same settings (Kent & O'Leary, 
1976; Patterson, 1974; Walker & Hops, 
1976). Patterson's treated children in- 
creased their appropriate-behavior scores to 
within the normal peer range, which main- 
tained through follow-up. The initial gains 
following treatment and the absence of ex- 
perimental-control group differences at 
follow-up were noted by Kent and O'Leary 
(1976) using peer-subject differences as their 
dependent measure. Unfortunately, the 
difference score does not indicate whether 
the nonsignificant group effect was due to 
control group increases in deviant behavior 
or decreases in the respective peer groups 
due to uncontrolled setting effects. Data for 
both subjects and peers seem necessary to 
explain the follow-up results accurately. 

The purpose of the present study was to 
investigate the effectiveness of the CLASS 
(Contingencies for Learning Academic and 
Social Skills) Program for acting-out chil- 
dren under field conditions and to concom- 
itantly evaluate two types of teacher-con- 
sultant training procedures. An additional 
purpose of this study was to investigate 
questions relating to the program's gener- 
alizability and to the durability of effects 
produced in child behavior. The CLASS 
Program (Hops, Beickel, and Walker, Note 
1) is a 30-day intervention program designed 
to reduce the inappropriate behavior of 
acting-out children in the regular classroom. 
The program is implemented bya teacher- 
consultant who provides direct services n 
the classroom and then trains and transfers 
control of the program to the classroom 


teacher. 
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Two experiments are reported in this 
paper. In Experiment 1, 12 consultants 
from a single school district were trained to 
implement the program during an 8-week 
semester course by two college-level in- 
structors relatively sophisticated in behav- 
ioral analysis procedures. Consultants in 
this experiment were provided with free 
tuition. In Experiment 2, 16 consultants 
from two school districts were trained to 
mastery in the CLASS Program procedures 
during 3-4 workshop days. Consultants and 
teachers were paid for their participation 
contingent upon completion of key program 
tasks during the implementation process. 

In both experiments, referred children 
were screened and then randomly assigned 
to either an experimental or a control group. 
Behavioral observation data collected by 
trained observers, naive to experimental 
conditions, provided the major dependent 
measure. In Experiment 1, pre- and 
postintervention data were collected. In 
Experiment 2, observations were collected 
on four occasions: preintervention, during 
intervention, postintervention, and at fol- 
low-up in the next academic year. Experi- 
mental teachers and consultants also com- 
pleted rating scales to indicate their satis- 
faction with the program and its usefulness. 
These data provided evaluation of consumer 
satisfaction (Kazdin, 1977). A long-term 
follow-up was assessed by searching each 
child’s file to determine his or her present 
placement and use of special services, 


General Method 
CLASS Program 


vfus proced 
designed to modify the disru; 

peed child in rd regular ptive behavior of the act- 
relies upon a teacher. maroo, 


runs the program in the classroom for 

Although the consultant demonstrates the pes 
effectiveness, the teacher is gradually involved and fi- 
nally assumes responsibility for its operation on the 
sixth program day. 

The specific details of the CLASS can be 
found in the procedural Consultant Manual (Hops et 
al, Note 1). The standardized Program package com- 
bines a number of treatment components Previously 
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found to be effective in remediating child behayi 
problems and is based primarily on the work of P; 
terson, Cobb, and Ray (1972), Walker, Hops, and Fi 
genbaum (1976), and Walker, Mattson, and Buckl 
(1971). These include (a) token economy, (b) respon 


* cost, (c) systematic suspension, (d) contingency con 


tracting, (e) recycling or branching, and (f) mini 
parental involvement. Two additional key ingredient 
of the package are the delivery system and proced 

for establishing generalization and maintenance 
treatment effects. i 


Development centers funded by the Bureau of Educa-! 
tion for the Handicapped in the U.S. Office of Educa: 
tion. The Center’s goal is to develop practical, cost- 
effective, and validated programs for managing childi 
with behavior disorders in school settings. Studies 
conducted within a three-stage development proc 
which determines, first, which procedures are effecti 
in modifying specific behavior disorders within a tight 
controlled experimental class setting; second, whether 
the procedures can be implemented in the school setting 


room (Walker et al., 1976) and that the adapted proce- 
dures had similar effects in regular classrooms (Hopf 
& Beickel, Note 2). 

A pilot study was then conducted to develop and 
evaluate a set of training procedures for training con 
sultants.! A group of 14 consultants was trained to 
mastery in two 2-day workshops to implement the 
CLASS Program procedures, Each trained consultant 
applied the program to one acting-out child enrolled in 
^ regular classroom. The proportion of appropriate 
behavior recorded for the 14 treated children incre 
significantly from 63% before treatment to 77% imme 
diately following treatment. The pilot study demon 
strated that consultants could be trained in a relatively 
brief period to operate the program successfully. Un 
fortunately, a nonrandom, unmatched control group 
not allow a true test of the experimental procedures. 
‘The present study is designed to externally validate the 


aa under more rigorous experimental condi- 
ions, 


Field Test Sites 


Experiments 1 and 2 were carried out in Torrance, 
California, and in Honolulu and Kaneohe, Hawaii, re- 
spectively. 


We wish to acknowledge the contributions of Harry 
berg, Charles Richmond, and the consultants an 
teachers of the Visalia Unified School District, Visalit 
CU for their participation in this portion of 
study. 
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In Torrance, teacher-consultants were trained by two 
College of Education professors using a preliminary 
draft of the CLASS Consultant Trainer’s Manual 
(Hops, Beickel, Fleischman, & Walker, Note 3). In 
Hawaii, a group of teacher-consultants was trained in 
the CLASS Program procedures by CORBEH trainers 
who had been involved in developing the original pro- 
gram. All trainees were employed as teacher-consul- 
tants in their respective school districts or were in a 
position to serve teachers in this capacity. 


Subjects 


A total of 54 subjects, 27 experimental and 27 control, 
each from a different classroom, participated in the 
experiments.? The children were drawn from Kin- 
dergarten through Grade 2. The preponderance of 
males to females (53:3) across both sites was consistent 
with the literature that indicates a higher incidence of 
males referred to special services for disruptive, act- 
ing-out behavior in the classroom (Walker & Buckley, 

1974; Woody, 1969). 


Observation Procedures 


Direct observations were made in each subject’s 
, regular classroom by a cadre of professionally trained 
observers recruited and trained within each field site. 
The 16-category, 8-sec interval CLASS Program ob- 
servation code (Hops & Nicholes, Note 4) was used to 
define and record subjects’ behavior. Observations 
were recorded for experimental and control subjects 
immediately prior to and following implementation of 
the CLASS Program procedures. In Experiment 2, 
additional observations were made between Day 10 and 
Day 15 (during) and in the fall of the next academic year 
(follow-up). 

Data were collected on each subject and each subject's 

same-sex peer group in the same instructional setting 

7) the Subject, for example, independent seat work, in 
alternating intervals. A minimum of 1 hour of obser- 
vation, in reading and math periods only, was recorded 
for each subject in each study phase. 

A rotating procedure was used to calibrate agree- 
ments between each observer and every other observer. 
Agreement coefficients were calculated by scoring each 

“sec interval for the number of agreements between 
palra of observers and dividing by the total number of 
rf ents recorded (agreements plus disagreements). 
Across phases, the average reliabilities were 92.6 (SD 
* 6.3, N = 43) and 88.2 (SD = 7.8, N = 151) for Exper- 

« ‘ments 1 and 2, respectively. 


Dependent Measures 


The proportion of appropriate behaviors coded was 
MU MIS dependent variable used in the present 
tin An appropriate-behavior score was derived for 

subject for each phase by dividing the total fre- 
quency for code categories determined a priori to 


appropriate by tl all code 
A. categories. y the total frequency for 
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The appropriate code categories were the following: 
attend, talk academic, work, volunteer, management, 
and approve. The inappropriate code categories con- 
sisted of play, irrelevant talk, look around, inappro- 
bras locale, disruptive, physical negative, and disap- 
proval. 


Design and Analysis 


Whereas randomly assigned untreated groups are 
used to control for internal validity and rival indepen- 
dent hypotheses, for example, maturation and regres- 
sion (Campbell & Stanley, 1963), additional variability 
in observed classroom behaviors may be attributable to 
specific classroom stimulus conditions. Classrooms 
vary in the overall amount of appropriate behavior be- 
cause of structure, curricula, and so forth. Walker and 
Hops (1976) suggested that peer data collected on a 
subject's peers can be used to statistically control for 
stimulus conditions peculiar to each classroom. In the 
present study, observation data were recorded on each 
subject’s same-sex peers during each observation period 
and used as a covariate to control for classroom condi- 
tions in the analyses of the behavioral observation 
data. 

Subject observation data were analyzed with a two- 
way analysis of covariance, with repeated measures 
across phases; Treatment (experimental vs. control) 
X Phases (preintervention vs. postintervention in Ex- 
periment 1; preintervention vs. during vs. postinter- 
vention in Experiment 2). Follow-up data in Experi- 
ment 2 were collected in the fall of the next academic 
year for subjects in new classrooms with different 
teachers and peers. These data were then subjected to 
a separate one-way analysis of covariance. 


Experiment 1: Using Field Trainers 
to Train Consultants to Apply the 
CLASS Program 


This study was designed to investigate 
whether local personnel, who were relatively 
sophisticated in behavior analysis, could be 
taught to use the CORBEH training proce- 
dures and in turn train teacher-consultants 
to effectively use the CLASS Program. The 
successful demonstration of such a local 
delivery system could greatly enhance the 
practical usef! of the CLASS Program 
and thereby make it hong to a larger 

f potential consultants. 
| ool trainer's manual (Hops et al., 
Note 3), detailing CORBEH's consultant 
training procedures and materials, was de- 
veloped for use in this experiment. The 


2 One control and one experimental subject in Ex- 


1 and 2, respectively, moved from their school 
districts during each of these studies. 
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participating trainees were from the Tor- 
rance School District, Torrance, California, 
a suburb of Los Angeles with a population of 
approximately 135,000. 


Method 


The consultant trainers were two members of the 
education faculty at California State College, Dom- 
inguez Hills. In a 2-day visit to Eugene, Oregon, they 
were "walked through" the training procedures and 
provided with all the materials required to successfully 
train CLASS Program consultants, that is, manuals, 
transparencies, videotapes, and so forth. 

‘Twelve consultant trainees, all but one of whom were 
special class teachers, were enrolled in a course at the 
college and were trained in a series of twice-weekly, 
l'5-hour meetings. Each trainee was provided with a 
copy of the CLASS Program Consultant Manual (Hops 
et al., Note 1), which describes daily program proce- 
dures and instructs the consultant in methods for 
implementing them. The training sessions focused 
upon building trainee competence in the following 
CLASS Program skills: (a) identifying and selecting 
an appropriate child for the program; (b) observing and 
recording on-task behavior in the classroom; (c) en- 
listing the cooperation of the parents, teacher, child, 
peers, and other involved school personnel; (d) running 

the program in the classroom; (e) modeling appropriate 
behaviors for the teacher; and (f) monitoring the 
teacher's behavior. 

For the purpose of ensuring randomization of subjects 
and equivalence of groups, each trainee selected two 
children according to the CLASS Program criteria, and 
one of each pair was randomly assigned to either the 
experimental or the control group. Each child was in 
a different room with a different teacher. A copy of the 
CLASS Program Teacher Manual (Hops, Fleischman, 
& Beickel, Note 5) was given to every experimental 
b to n as a reference and guide. 

e consultant trainees’ tuition for the course was 
paid by CORBEH, and their grades were dependent 
upon their performance as evaluated by the course in- 
structors. Each trainee was allowed 10 hours of release 
time from classroom teaching responsibilities to im- 
plement the CLASS Program procedures. 


Results 


The adjusted and unadjusted means and 
standard deviations of percentage of ap- 
propriate behavior for experimental and 
control groups across pre- and postinter- 
vention phases are presented in Table 1. 
The means, adjusted for peers’ data, show a 
7% gain for experimental subjects and a 2% 
loss for controls. 

The analysis of covariance produced a 
significant interaction, F(1, 18) — 4.89, p< 
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Table 1 : | 
Proportion of Appropriate-Behavior Phase — 


Means (Experiment 1) 


Preintervention _Postintervention 
Un- Un- 
adjust- Ad- adjust- Ad- 
Group ed justed ed justed 
Experimental condition 
Subject 
M 724 185 813 806 } 
SD 104 070 
Peer 
M -781 .808 i 
SD .072 .083 
Control condition 
Subject. 
M 4110 4724 721 «103 
SD 061 .103 
Peer 


M -176 
.068 


.822 
.052 


Note. In the experimental condition, n 11; in the control 
condition, n = 10. 


nificantly increasing its level of appropriate 
behavior from pre- to postintervention. 


Discussion 


The results of this study demonstrated the 
feasibility of using local trainers with skills 
in behavior modification to train teacher- | 
consultants in the CLASS Program. The 
fact that they were able to successfully train 
others to apply the CLASS Program effec- | 
tively without personally having imple- 
mented the program before has powerful 
implications for its widespread dissemina- 
tion. With the assistance of the training 
materials, behaviorally sophisticated per- 
sonnel within local school districts or colleges 
would be able to train themselves and their | 
colleagues to successfully implement the - 
CLASS Program. Subsequent research, y 
however, will be required to determine the 
extent to which consultant trainers must be 
trained in CLASS Program procedures an 
in behavior modification skills. , 

A search of the students’ files in the dis- 
trict office approximately 11 years after 
termination of the project showed that 1 of 
the 12 experimental (8%) and 3 of the n 
control (2796) subjects had been placed i? ; 


Y 
| 
.05, with only the experimental group sig- | 


i 


self-contained special education classrooms. 
Although the numbers are small, the data 
suggest that the CLASS Program may in- 
deed include some long-term preventive ef- 
fects as well. 


Experiment 2: Training Consultants 
to Implement the CLASS Program 

i 

In the second study, training was con- 
ducted by CORBEH personnel directly in- 
volved in the development of the original 
program procedures. Experiment 2 con- 
tained two additional observation phases to 
provide data about the level of each child’s 
performance (a) during the actual imple- 
mentation of the program and (b) in the 
following academic year, in new classrooms 
with new teachers and different peer 
groups. 

Sixteen consultant trainees, eight from 
each of two school districts, participated. 
The trainees were from the Honolulu School 
District, Honolulu, Hawaii, a large metro- 
politan area of approximately 325,000, and 
the Windward Oahu School District, Ka- 
neohe, Hawaii, a rural agricultural area of 
approximately 30,000. The Honolulu 
trainees consisted mostly of school counsel- 
ors. The majority of the Windward Oahu 
trainees were Diagnostic Prescriptive 
Teachers, a form of resource teacher. 


Method 


The training procedures in Experiment 2 differed 
e those in Experiment 1 in that they occurred during 
pe workshops of approximately 1-2 days 

» 2% weeks apart. The first dealt with the selection 
ag the second, with the implementation proce- 
dni As in the first study, a variety of techniques, 
feo xd discussions, videotapes, and roleplaying plus 
foal ick, were used to build in conceptual and behav- 

tal mastery of the CLASS Program procedures. Two 
Ede selected by each trainee according to the 
ich blished CLASS Program criteria, and one of 
em pair was randomly assigned to either the experi- 

ntal or the control group. 
oe consultants and classroom teachers 
AR pai in order to compensate them for the extra 
Wee in implementing the procedures. Con- 

ildr received $450, and teachers of experimental 
Bein. in $50-$100 payments at predetermined 
Were paid S program. Teachers of control children 
efforts i the single lump sum of $50 for their initial 

J in securing an appropriate candidate. 
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Results 


Adjusted and unadjusted means for the 
subjects and their peers are presented in 
Table 2. A two-way analysis of covariance 
was conducted over treatment (experimental 
vs. control) and phases (preintervention vs. 
during vs. postintervention), with peer data 
as a covariate to control for variation in 
classroom stimulus conditions. A significant 
interaction, F(2, 51) = 11.86, p < .001, 
clearly demonstrated the ability of the 
CLASS Program procedures to increase the 
proportion of appropriate classroom be- 
havior of the experimental children. Post 
hoc analyses showed a significant increase in 
the experimental group’s mean level from 
preintervention to during intervention (p <. 
01), witha slight but nonsignificant decrease 
to postintervention. In contrast, the control 
group showed no changes during the same 
time period. 

The follow-up data indicated gains for 
both groups over their postintervention 
levels. However, an analysis of covariance 
produced a significant difference between 
the two groups, F(1, 23) = 4.57, p < .05, 
which indicates that the control group's in- 
crease did not approach that of the experi- 


mental group. 


Discussion 


The results clearly demonstrate that 
consultants can be trained within a relatively 
brief period to successfully implement the 
CLASS Program under field conditions. 
The gains made by the experimental group 
were shown to maintain at postintervention 
levels after the program procedures had 
terminated and into the next academic 


V Scheel files were examined 3 years after 
the program to determine each child's edu- 
cational placement and the degree of special 
services being used. Of the eight experi- 
mental children still in the school district, 
63% (five) were functioning normally in 
regular classrooms, and 37% (three) had been 
referred for or were receiving special services. 
Of the 11 control children found, only 3696 
(four) were functioning without aid in reg- 
ular classrooms, 27% (three) had been re- 
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Table 2 i 
Proportion of Appropriate- 
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Behavior Phase Means (Experiment 2) 


i i ingi i Postintervention Follow-up 
Preintervention During intervention x t É ] | 
Group Unadjusted Adjusted Unadjusted Adjusted Unadjusted Adjusted Unadjusted Adjusted] 
Experimental condition 
id 615 .623 «779 -163 NE -728 n .822 
SD .101 071 081 4 
je .698 -736 722 ae 
SD 105 .099 i .082 A 
Control condition 
S .665 4674 674 .668 .663 673 .803 «118 
SD .093 -108 -109 -080 
Peer 
.843 
M .696 -720 -693 
SD .058 057 .060 .042 


Note. In the experimental condition, n = 16; in the control condition, n = 17. Follow-up data were analyzed separately on I 


experimental and 12 control subjects, 


ferred for or were receiving additional ser- 
vices, and 36% (four) were in self-contained 
special education classrooms! These data 
and the similar results in Experiment 1 have 
important implications, They suggest that 
children who have experienced the CLASS 
Program will require reduced levels of spe- 
cial services in the long run and, thus, a less 
costly education. 

The increases shown by the control group 
in the follow-up phase were surprising, but 
they replicated similar findings by Kent and 
O'Leary (1976, 1977). Several hypotheses 
are available to help clarify these results. 
Kent and O'Leary (1976) suggested that 
problem behavior may be cyclical and that 
children are referred when their level of 
disruptive behavior is relatively high. This 
may have been the case in the present study. 
Consultants were asked to solicit problem 
children from teachers in their respective 
schools. It is possible that only children who 
were at an apex of a disruptive-nondisrup- 

tive cycle were referred. 

A second hypothesis concerns potential 
bias in the observation data. Eight of the 13 
observers used in the follow-up phase were 
new. Notwithstanding the training that 
observers received and high interobserver 
agreement, a systematic bias may have been 
operating in this group to record behavior as 
more appropriate for both experimental and 
controlchildren. Tentative support for this 


explanation can be seen in the peer data) 
which also showed marked increases in the} 
follow-up phase. Kent and O’Leary (1976) | 
collected peer follow-up data in their studies 
but unfortunately the means cannot be de] 
termined from their method of data pre- 
sentation. It is clear that more follow-up 
research with both experimental and o 
groups is required to precisely understan l 
why such increases occur. | 


General Discussion 


characteristics about the effectiveness of b 
CLASS Program. First, the extent to whi 

CLASS was effective in changing the H 
havior of acting-out children across suc 

diverse classroom settings and field sites 
emphasizes the external generalizability did 
the program. Since the settings were so dl 

Verse, ranging from rural to urban com 
munities and across 28 different classroom 
the power of the CLASS Program acros 
Settings is well documented. 


5 e 
trainees themselves in Experiment 2. b 
results demonstrate that the procedures CaM 
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be used effectively by regular school per- 
sonnel in real-world conditions, a finding too 
infrequently noted in the experimental lit- 
erature.2 Since the procedures were im- 
plemented by 28 different consultants with 
little monitoring by experimenters or local 
program coordinators, it is likely that each 
consultant-teacher-child interaction was 
unique. The success of the program across 
such a variety of regular school personnel 
further emphasizes the program’s generali- 
zability. 

Third, replications of the program’s ef- 
fectiveness under two quite different 
training formats also lend support to its re- 
liability and usefulness. The training was 
conducted at several levels removed from the 
actual target of the CLASS Program, that is, 
the acting-out child. At the farthest level, 
teacher-consultant trainers were trained; 
they in turn trained teacher-consultants, 
who in turn trained classroom teachers and 
supervised their application of the program 
in the process of changing the disruptive 
behavior of acting-out children. The results 
suggest not only that the procedures are 
powerful but that the delivery system is 
sufficiently well constructed to ensure their 
correct application. 

Fourth, the cost-effective delivery system 
demonstrated in the present report lends 
considerable power to the CLASS Program 
as a viable treatment alternative for use in 
the educational setting. Requiring no more 
than 1-2 days to train consultant trainers 
and 3-4 days to train consultants makes the 
program potentially available to many spe- 
cial and regular educators at relatively little 
cost. Given the current demands for pro- 
viding services for handicapped children, 
programs that have been demonstrated to be 
effective and require relatively brief training 
periods are likely to be very appealing to 
consumers and may have the greatest impact 
on the educational system. Ratings on a 
T-point Likert scale sampling trainees’ 
Preferences indicated consistently high 
ratings of the training procedures (5-6) and 
p Lon with the overall program 
pun the long-term follow-up data 
A ed from the children's school files 

rongly indicate potential preventive as- 


pects of the CLASS Program with important 
implications for future educational practice. 
Given the likelihood of gains to teachers 
through easier management of once-dis- 
ruptive children, the data also suggest that 
the program may indeed reduce future ed- 
ucational costs. With the increasing ex- 
pense of education generally, and the 
placement of each handicapped child in the 
least restrictive educational setting as now 
required under Public Law 94-142, these 
findings are particularly noteworthy. 
However, final measure of the program's 
effectiveness will be the extent to which it is 
adopted and used by school personnel who 
are charged with responding to problems 
posed by acting-out children. 


3 It is acknowledged that course credit or honoraria 
paid to the participants for implementation did not 
provide for assessment of the program when imple- 
mented by poorly motivated personnel. Further re- 
search is required to answer this question. 
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This research studied achievement i i 
S 3 patterns and the relationship of other-di- 
rection and attitudes toward women to achievement in 43 tenth-grade females 


with IQ scores above 110. 


h Underachievers 
grade average was below the class mean. Achievement indices consisted of 


were subjects whose high school 


high school grades and scores on the Stanford and Iowa achievement tests. 


Grade averages from Grades 2 through 10 were examined. Achievement pat- 
rm revealed a significant difference between the grades of achievers and un- 

lerachievers beginning at Grade 6. Other-direction and attitudes toward 
women were significantly related to mathematical achievement test scores and 


high school grades. Results are 


discussed in relation to research in locus of 


control and factors contributing to adult underachievement in bright 


females. 


The relative paucity of research con- 
cerned with the early identification of un- 
derachievers and the variables associated 
with this underachievement has been noted 
ina recent review (Asbury, 1974). The need 
for further, cohesive investigation of un- 
derachievement is particularly evident in the 
2 of underachievement in bright females. 
x and McCuen (1960) first reported sex 

poe in the patterns of under- 
dm ievement in bright high school students. 
eir data revealed that while male under- 


achievers demonstrated a consistent pattern 


T underachievement from first grade 
eee high school, female underachievers 
eich a from female achievers until 
"- e 9, with a downward trend in the for- 
group's grades beginning at Grade 6. 

ese patterns of achievement would appear 
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to have significant import for the investiga- 
tion of underachievement; however, the 
implications of these findings have not been 
pursued. The present study was designed 
for this purpose. 

The significance of Shaw and McCuen’s 
findings lies in the decline in the female un- 
derachievers’ grades in early adolescence. 
The present research considers two aspects 
of adolescence as having a potential impact 
on this decline: (a) the increasing influence 
of peers and peer values that are incongruent 
with academic achievement and (b) the in- 
creasing awareness of one’s sex role. 

Adolescence is a period of increasing in- 
fluence of one's peers and peer values anda 
diminishing role of one's parents as a pri- 
mary reference group (Adams, 1973). Asa 
result, the adolescent finds that different 
behaviors are being reinforced. While par- 
ents may have encouraged academic 
achievement in the early school years, the 
bright female adolescent may find that her 
peer group does not view this type of 
achievement as congruent with her sex role 
(Stein & Bailey, 1973). Parents may sup- 
port these peer values by modifying their 
own reinforcement of their adolescent 
daughters to be more consistent with their 
perception of her sex role (Bardwick, 1972). 
With these pressures being brought to bear, 
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those females who are most susceptible to 
the influence of others, or more other-di- 
rected, may modify their achieving behaviors 
to be more congruent with the values of their 
peers. 

The increasing awareness of one’s sex role 
in adolescence suggests that the female ad- 
olescent’s attitudes concerning the roles of 
women are also a potential factor in the de- 
cline in achievement. This hypothesis is 
supported by studies of females of wider 
ability ranges (Houts & Entwistle, 1968) as 
well as research in the area of achievement 
motivation (Alper, 1973). 

Scales of other-direction and attitudes 
toward women’s roles were selected to mea- 
sure the impact of these factors. Other- 
direction is concerned here with the degree 
of influence of others’ opinions upon one’s 
actions and beliefs. It was anticipated that 
those subjects revealing a stronger other- 
direction orientation and a more traditional 

view of women’s roles would be more in- 
fluenced by these changes in adolescence 
and, hence, would demonstrate lower levels 
of achievement than subjects with opposing 
orientations. The measure of other-direc- 
tion was designed to reflect that factor of 
locus of control having to do with one’s sus- 
ceptibility to the opinions of others (Collins, 
Martin, Ashmore, & Ross, 1973). 

The concept of locus of control has come 
under criticism in recent years as a multidi- 
mensional construct. Two of the more 
common factors identified are (a) the basic 
predictability of one’s environment and one’s 
skill at manipulating that environment and 
(b) the origin of one’s goals and motives, 
Other-direction, then, is a measure of this 
latter aspect of locus of control, which does 
not Incorporate the Personal potency factor 
of locus of control. Past research has often 
resulted in conflicting findings concerning 
the relationship between locus of control and 

achievement, particularly with females 
(Collins et al., 1973; Crandall, Katkovsky, & 
Crandall, 1965; Maccoby & Jacklin, 1974); 
externality has not been consistently related 
to lower achievement as hypothesized. This 
research may have been confounded by the 
variable of other-direction, which in itself 
might be a more powerful predictor of aca- 
demic achievement for females than the 
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multidimensional locus of control cop. 
struct. 

Verbal and mathematical achievement 
test scores as well as school grades were se. 
lected as indices of achievement. The use 
of achievement test scores allows an expan- 
Sion upon the findings of Shaw and McCuen, 
who measured achievement declines only in 
terms of high school grades. The examina: 
tion of achievement test scores serves to 
further document underachievement. In 
addition, the mathematical and verbal scores 
permit an examination of these separate 
areas of achievement. Students have been 
found to view mathematics as a more mas- 
culine subject area than verbally oriented 
skills such as reading (Stein & Smithells, 
1969). Given the hypothesized effects of 
other-direction and women’s roles in this 
research, the masculine connotation of 
mathematics necessitates its consideration 
as a separate area of achievement. Mean- 
while, the use of school grades permits a 


comparison with the Shaw and McCuen| 


findings as well as an examination of general 
academic achievement trends. 

The purpose of this research then is to 
attempt to replicate the findings of Shaw 
and McCuen (1960) and to investigate the 
relationship of other-direction and women’s 
roles to achievement in adolescence. In s0 
doing, it was hoped that some clarity might 
be brought to the conflicting findings con- 
cerning both female achievement and locus 
of control and its relationship to achieve- 
ment. 


Method 
Subjects 


The subjects were identified from a tenth-grade class - 


of 330 students in a primarily middle-class high schoo 
in San Antonio, Texas, The 43 subjects consisted of al 
the female students within that class who had receive 

IQ scores of above 110 on the Otis-Lennon Test on 


Mental Maturity (Otis & Lennon, 1967). The subjects - 


were classified as achieving or underachieving base 
upon a class average of 79.62 for grades obtained in 
Grades 9 and 10. These procedures are consistent w! 
those used by Shaw and McCuen. As all subjects Bet 
of above-average ability level, those subjects whose hie! 
school grade average was below the class mean bur 
Classified as underachievers (n = 14); those whose grad? 
Average was above the class mean were classified a 
achievers (n = 29). The two groups did not differ! 
mean IQ scores, 


k 
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. Table 1 


t 


| 
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Significance of Differences 
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Between Mean Grades of Achievers and Underachievers From 


Grades 2 Through 10 


Underachievers Achievers 
Grade n M SD n M SD F 
2 10 86.86 5.82 5 88.51 2.72 
z z E .35 
3 10 84.37 6.00 T 87.18 7.96 69 
4 10 83.67 5.17 12 87.88 4.75 3.96 
5 11 84.08 5.98 16 87.62 5.13 2.71 
6 12 81.64 6.09 19 88.41 3.43 15.77* 
7 14 79.64 6.75 25 87.31 4.00 19.99* 
8 14 76.77 6.05 29 86.71 4.00 38.11* 
9 14 74.04 5.15 29 86.99 3.36 79.32* 
10 14 78.04 5.80 29 8894 435 _ 44.23* 
*p «.0l. 
Measures was placed upon the selection of items involving career 
‘ roles and achievement. 
Achievement. Indices of academic achievement 


consisted of yearly grade averages in academic, none- 
lective subjects for Grades 2 through 10 and the mean 
percentile scores on verbal and mathematical achieve- 
ment tests administered in Grades 9 and 10. These 
latter measures consisted of scores on the Iowa Tests of 
Educational Development (Lindquist & Feldt, 1963) in 
Grade 9 and scores on the Stanford Achievement Test 
(Gardner, Merwin, Callis, & Madden, 1965) in Grade 
10. As academic grades may be subject to some criti- 
p due to the subjective manner in which they are 
assigned, achievement test scores were used to examine 
whether achievement declines were also reflected in 
Standardized performance measures. Academic grades 
rode 9 and 10 were used to obtain a mean high 
School grade average for analyses of variance involving 

other-direction and women's role. 
Breton This measure consisted of a 10- 
ja nm constructed by Collins et al. (1973) and 4 
ae m ed for the purpose of this study to counter an 
CAR na ie pons aet bias in the original scale. The 
N is designed to measure the degree to which one is 
E ped by others. Collins et al. (1973) define a 
dra PORE high on this scale as one who “explicitly 
ae a ledges that the direction for his own behaviors 
os from other people, e.g., ‘T live too much by other 
ens s ra (p. 478). Items on the scale have 
40 CREAR cluster with a minimum factor loading of 
1973). or analysis of 63 items (Collins et al., 
Eu toward women's roles. This scale was 
is ni i measure subjects’ attitudes toward various 
DH of the traditional role of women. The 10-item 
BE sed of 8 items selected from the Attitudes 
1973) and pen Scale (Spence, Helmreich, & Stapp, 
CORAN items used by Houts and Entwistle (1968) 
SERE m e adolescents to measure general attitudes 
aede nen. Items were selected on the basis of 
ich ae to adolescents as well as data concerning 
iun H ability to discriminate between traditional 
ana, sex role orientations (Tully, 1973). Al- 
ign the questions covered several areas, an emphasis 


Procedure 


Subjects were administered the measures of other- 
direction and attitudes toward women’s roles by their 
homeroom teachers. Following the receipt of parental 
consent forms, achievement data were collected from 
school records. Subjects were then classified as 
achievers or underachievers on the basis of their average 
semester grades in Grades 9 and 10, and subsequent 
data analyses were performed. 


Results 


data for each year were first ana- 
lyzed for evidence of achieving and under- 
achieving patterns. Table 1 shows the mean 
annual grades of underachieving and 
achieving groups from Grades 2 through 10 
and the results of F tests performed at each 
grade level to test for significant differences 
between groups. The analyses reveal a sig- 
nificant difference (p < 01) in the grades of 
underachieving and achieving groups be- 
ginning at Grade 6 and continuing through 
Grade 10. (Significant differences at Grades 
9 and 10 are due, in part, to the classification 
process of underachievers and achievers; 
however, these analyses do serve to illustrate 
that the differences between the two groups 
are significant in these years). Prior to 
Grade 6, the grades of the two groups do not 
differ significantly. Examination of the 
grade patterns indicates that the change is 
due to a drop in the grades of the under- 
achieving group, while those of the achieving 
group remain remarkably stable. 


Grade 
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Table 2 J 
Significance of Differences Between High 
School Achievement Test Scores of Achievers 
and Underachievers 


Under- 
Achievement _ achievers Achievers 
test MALLS) VM SD t 
Verbal 55.84 17.69 73.37 13.60 3.50** 


Mathematical 47.68 16.28 63.35 19.05 2.57* 
— 0 19490 9i 


Note. For underachievers, n = 14; for achievers, n 29. 
* p € .05. 
** p « 01. 


Comparisons of mean achievement test 
scores of underachievers and overachievers 
in Grades 9 and 10 are shown in Table 2. 
These findings further validate the lower 
achievement of the underachieving group by 
demonstrating their underachievement on 
objective standardized measures, 

A series of two-way analyses of variance 
was then performed in order to examine the 
relationship between the independent vari- 
ables of other-direction and attitudes toward 
women’s roles and the various indices of 
achievement. Achievement scores rather 
than the nominal classifications of achiever 
and underachiever were used in these anal- 
yses in order to examine the full range of 
variability in achievement, Achievement 
indices for these analyses consisted of mean 
high school grades and achievement test 
Scores in Grades 9 and 10, 

A significant main effect was found for the 
measure of other-direction with mathe- 
matical achievement test scores, F(1, 41) = 
7.81, p < .01, with those subjects Scoring 
below the median on the measure of other- 
direction having higher achievement scores 


Results of analyses for Subjects' attitudes 
toward women's roles are shown in Table 3, 
Significant main effects for attitudes toward 
women's roles were found with both high 
school grades (p < .01) and mathematical 

achievement test scores (p <.05), with those 
subjects revealing a more liberal orientation 
on the women’s role measure having higher 
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scores on those two achievement meas res 
than those subjects revealing a more trad 
tional orientation toward women. The 

lationship with verbal achievement was ní 
significant. Significant interactions bi 
tween other-direction and women's rol 
were not demonstrated in any achieveme t 
analysis. 


Discussion 


nificant differences between achieving and 
underachieving groups were found in th 
early elementary grades. As in the Sha 
and McCuen research, the status of the 
derachievers was due to a decline in tl 
grade averages in the middle school ye 
The earlier decline in the achievement level » 
of the underachieving group in this study 

may be partially attributable to changes in 

the adolescent subculture from the time f 


lend support to the hy- 
pothesized relationship between other-di- 


Table 3 "p. 
Achievement Means and Standard Deviations. 
by Orientation Toward Women's Roles 


Achievement ^ Traditional ^ Liberal 


measure M SD M SD 
——Hesmure. M SD M SDO 
High school 
grades 
Verbal 
achievement 
Scores 
Math achievement 
Scores 


81.80 7.70 86.90 5.01 6.54**. 


67.80 18.57 74.71 14.50 1.88 


53.90 19.54 68.57 18.22 6.92* — 


Note. For traditional, n = 22; for liberal, n = 21. 
* p € 05. 
** p € 01. 
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rection and academic achievement in the 

area of mathematics. Research has sup- 

ported the traditional notion that females 
tend to perform more poorly in mathemati- 
cal areas than do males, particularly in the 

later school years (Maccoby & Jacklin, 1974). 

These findings suggest that as girls pick up 

the notion that mathematics is a masculine 

area of expertise, those who are more subject 
to the influence of others perform less well 
in the mathematical area than do those who 
are less susceptible to the opinions of their 
peers. The achievement measures of overall 
grade average and verbal achievement test 
scores may not reflect the influence of peer 
values so strongly because these areas are 
more androgynous in connotation (Stein & 

Smithells, 1969). 

The results concerning other-direction 
and mathematical achievement are com- 
plemented by the findings concerning the 
relationship between subjects’ attitudes 
toward women’s roles and their achievement. 
Previous research reveals an increased inci- 
dence of underachievement for bright fe- 
males in college and postschool years (Bayley 
& Oden, 1955; Maccoby & Jacklin, 1974). 
At this stage, the relationship seen here be- 
tween mathematics, other-direction, and 

, Women's roles may become a relevant factor 
in other achievement areas concerning career 
choice and later career success, which may 
have the same negative connotation as 
mathematics in that certain career choices 

» are less likely to receive peer approval than 

ie more sex-appropriate areas of exper- 

e. 

Both self-esteem and locus of control have 
been hypothesized as contributing factors to 
the drop in women's achievement in the 
peace world; however, neither concept 

as been supported in the literature because 
differences between males and females have 
^ 1974) saint (Maccoby & Jacklin, 
ed ather than locus of control, the 
rop in women's achievement may be due to 

e combination of the factors of other-direc- 

Re and a traditional view of women’s roles. 

Pss aed has demonstrated that by adoles- 

wince. the negative affect toward female 
= Fr is already established in both males 

1974 emales (Monahan, Kuhn, & Shaver, 

). The results of the present study 
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suggest that those females who are more 
other-directed and traditional in their view 
of women's roles are more directly affected 
by these negative connotations. 

Previous conflicting findings on the rela- 
tionship between locus of control and 
achievement could have been confounded by 
the factor of other-direction, which while 
related to externality, is a subset of the 
construct that excludes the personal potency 
factor. Future investigations of achieve- 
ment, particularly those involving females, 
should take this factor into account. The 
relationship of other-direction to male 
achievement should also be examined, al- 
though it seems likely that the same rela- 
tionship would not be demonstrated given 
societal attitudes concerning male achieve- 
ment. 
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Dimensions of Creativity in Children’s Drawings: 
A Social-Validation Study 


Bruce A. Ryan and Andrew S. Winston 
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Creative drawings of three preschool girls were modified using reinforcement 
procedures in a multiple baseline design. Clear changes were produced in the 


diversity of color and form. 
judges, higher ratings of creativity 


When selected drawings were shown to adult 
were given for drawings with increased 


form diversity but not for drawings with increased color diversity. The results 
are discussed in terms of the role of social validation in defining and encourag- 


ing creativity. 


A number of recent studies have at- 
tempted to analyze and modify creative be- 
havior in young children by the use of con- 
tingent reinforcement. Those studies by 
Goetz and her colleagues (Fallon & Goetz, 
1975; Goetz & Baer, 1973; Goetz & Salmon- 
son, 1972; Holman, Goetz, & Baer, 1977) 
have involved notions of response diversity 
and novelty in children’s block building and 
drawing, while Ballard and Glynn (1975) and 
Maloney and Hopkins (1973) analyzed 
creative writing. 

These studies have established that vari- 
ous aspects of writing, drawing, or block 
building can indeed be brought under rein- 
forcement control and that some degree of 
generalization is possible (Holman et al., 
1977). It is unclear, however, whether the 
products produced by the children should be 
Said to show increased “creativity.” Nor is 
it clear how we should establish that the 
products were more creative, given the in- 
tense disagreement in the creativity litera- 
ture (see Nicholls, 1976). 

One possible, if partial, solution lies in the 
use of a social-validation procedure (e-£» 
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Fawcett and Miller, 1975) where indepen- 
dent judges could be asked to rate the crea- 
tivity of artistic products. The studies of 
creative writing (Ballard & Glynn, 1975; 
Maloney & Hopkins, 1973) demonstrated 
that some products produced under rein- 
forcement were rated by independent judges 
as showing increased creativity as compared 
to baseline work. The studies on children's 
drawing and block building provide no such 
evidence, possibly because diversity and 
novelty have an honored place in the litera- 
ture on creativity. That is, it may be argued 
that increased novelty and diversity imply 
increased creativity by definition. Whether 
the adults who regularly interact with chil- 
dren would agree remains an open and im- 
portant question. Moreover, there are à 
number of possible dimensions in children's 
artwork that might be encouraged through 
reinforcement: geometric forms, colors, 
lifelike objects, spatial arrangement of 
objects, and the like. Not all of these di- 
mensions may be relevant to judgments of 
creativity. It would be helpful to those en- 
gaged in encouraging children’s artistic en- 
deavors to know just what dimensions people 
i hen they refer to one 
roduct as “more creative” than another. 

The present study involved the use of re- 
inforcement for two dimensions of children’s 


creativity. 
/18/1004-0651$00.75 
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Method 


Production of Drawings 


Subjects. The subjects were three girls, Anne, Betsy, 
and Cathy, aged 5 years 2 months, 3 years 4 months, and 
5 years 1 month, respectively. They were chosen from 
a group of 20 in a university preschool program because 
of their high interest in drawing activities. 

Procedure, Each of the children produced two 
drawings per day over a 3-week period using a set of 18 
different colored felt-tipped pens. The children were 
limited to 5 minutes for each drawing. The first week 
for all children constituted a baseline period. In that 
phase the children were simply asked to draw whatever 
they wished, and the female experimenter acted gen- 
erally interested in the drawings regardless of their 
content or character. 

In the second week one child received reinforcement 
for producing drawings containing a variety of colors, 
and the other two received reinforcement for displaying 
a variety of forms. Reinforcement consisted of an op- 
portunity to see cartoon movies which was contingent 
upon each child reaching an accelerating criterion based 
on previous session performances. The drawings were 
scored immediately on completion for color or form 
diversity. Color scoring consisted simply of counting 
the number of different colors used, and form scoring 
was based on a checklist of forms adapted from Fallon 
and Goetz (1975). Reliability of these ratings was 
checked by a second rater. The ratings for color cor- 
related .98, and ratings for form correlated .82, 
During the third week all the children received rein- 
forcement for both color and form diversity. Thus, the 
overall plan is a simple version of a multiple baseline 
design in which form and color diversity were separately 
monitored. Reinforcement was applied first to one and 
then to both of these behaviors. 


Creativity Ratings 


color were chosen, while the Teverse was true for Cathy. 
The drawings chosen from the third phase were those 
high in both color and form diversity. These selection 
procedures were followed so that the drawings pre- 
sented to the judges would clearly differ in color and 
form diversity. The result of this selection procedure 
was a total of six sets of three drawings, which were 
shown one set at a time to independent judges. 
Judges. Two different sets of judges were chosen to 
rate the drawings: a group of 21 undergraduate stu- 
dents in an early childhood education curriculum 
and a group of 21 mothers of preschool children, 
Rating procedures. The ordering of the three 
drawings in each set and the presentation order of the 
sets were randomized. Thus, drawings produced by the 
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same child in different reinforcement phases were 

shown three at a time to the judges. The judges were 

then asked to indicate on a standardized form which 
drawing in each set was most creative, moderately 

creative, and least creative. Similarly they were asked 
to rank the drawings as most liked, moderately liked, 

and least liked. 


Results 


Production of Drawings 


The child’s form and color scores on the 
two drawings in each session were summed 
to yield a daily total of forms and colors, re- 
spectively. The effect of the reinforcement 
procedures on color and form diversity in the 
children’s drawings are shown in Figures 1, 
2, and 3. Following baseline, Anne and 
Betsy received reinforcement first for form 


ANNE 


Baseline Reinforcement 


NO. OF FORMS 
8 


6 


NO. OF COLORS 
8 $ 


8 
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Figure 1. Number of different forms and colors used 
by Anne during each session. (Scores represent sum 
of two drawings.) 


Baseline Reinforcement 


[1 
ke] 


[i 
I 
: | 
4 20 l 
8 | 
$ | 
iv | 
NN 

1 ! ——> 

2 4 6 8 10 12 

SESSIONS 


i igure 2. Number of different forms and colors used 
y Betsy during each session. (Scores represent sum 
of two drawings.) 


diversity and then for both form and color 
diversity. 

For Anne (see Figure 1), form diversity 
showed a clear increase from a baseline mean 
of 16 to a mean of 27.4 during reinforcement. 
The effect of the reinforcement was repli- 
tated with color diversity; mean number of 

colors used increased from 7.6 during base- 
line to 20.8. 
Betsy (see Figure 2) showed a modest in- 


crease in form diversity. Mean number of 


pote rose from 17.8 during baseline to 23 
uring reinforcement. There was, however, 
adrop in form diversity during the final two 
E The effect of the reinforcement on 
ud diversity was much more striking; 
ru number of colors increased from 5.0 to 


In contrast to the other two children, 
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Cathy received reinforcement first for color 
diversity alone and then for both color and 
form diversity (see Figure 3). Color diver- 
sity increased from a mean of 8 during 
baseline to 33.6 under reinforcement. The 
effect of the reinforcement on form diversity 
was not so clear, however. Cathy's baseline 
level of form diversity was very high (M = 
25.1). Since this baseline level was as high 
as the other children’s level of form diversity 
under reinforcement, it is likely that Cathy’s 
use of forms was at or near a ceiling. It 
should be noted that form diversity showed 
a declining trend during baseline and that 
the introduction of reinforcement reversed 
this trend (M = 28.5 during reinforce- 


ment). 
In sum, reinforcement had clear effects on 


CATHY 
Reinforcement 
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NO. OF FORMS 
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Figure 3. Number of different forms and colors used 
by Cathy during each session. (Scores represent sum 
of two drawings.) 
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color diversity for all three children, and on 


form diversity for two of the three chil- 


dren. 


Creativity Ratings 


The rankings of creativity were converted 
into ratings on a 3-point scale, with 3 indi- 
cating most creative. These ratings were 
subjected to a two-way (Judges X Phases) 
analysis of variance. Drawings from Anne 
and Betsy in which the phases were baseline, 
reinforcement for form, and reinforcement 
for both color and form were analyzed sep- 
arately from Cathy’s drawings, in which the 
middle phase was reinforcement for color 
only. 

In the ratings of Anne and Betsy’s draw- 
ings, there was a significant main effect for 
reinforcement phases, F(2, 80) = 25.9, p 
<.001. There was no effect for judges, and 
the Judges X Phases interaction was 
nonsignificant. That is, parents and stu- 
dents did not differ in their ratings. Mean 
creativity ratings, collapsed across judges, 

were 1.56 for baseline, 2.20 for drawings high 
in form diversity, and 2.30 for drawings high 
in both form and color diversity. Post hoc 
(Newman-Keuls) analysis revealed that 
drawings high in form diversity were rated 
as significantly more creative than baseline 
drawings (p «.05). Drawings high in both 
form and color diversity were not rated as 
more creative than those high only in form 
diversity. 

Ratings of Cathy's drawings were also 
subjected to a Phases x Judges analysis of 
variance. The main effect for judges and the 
Phases X Judges interaction were nonsig- 
nificant, and as with the other drawings, 
parent and student ratings did not differ. 
The main effect for reinforcement phases 
was significant, F(2, 80) = 28.5, p «.001. 
The mean creativity ratings, collapsed across 

judges, were 2.10 for baseline, 1.43 for 
drawings high in color diversity, and 2.50 for 
drawings high in both color and form diver- 
sity. Post hoc analyses indicated that 
drawings high in color diversity only were 
rated as significantly less creative than 
baseline drawings (p «.05). However, 
drawings high in both color and form diver- 
Sity were rated as more creative than base- 


‘identifying which features in complex be: 
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line drawings or drawings high in color di- 
versity only (p «.05). ; 

Ratings of creativity were reasonably 
closely related to ratings of liking. Pearson 
product-moment correlation coefficients 
between these ratings were calculated for the 
baseline, form reinforcement phase, and 
color/form reinforcement phase for Anne 
and Betsy. Similar calculations in the three 
phases were made on Cathy’s data. Thr 
resulting six correlation coefficients ranget 
from .35 to .63 with a median value of .53 
All differed significantly from zero beyon 
the .01 level. 


Discussion 


The results of the present study are con-| 
sistent with earlier work (e.g. Fallon & Goetz, 
1975; Goetz & Salmonson, 1972) in showing 
that selected aspects of children's drawing 
can be defined, reliably recorded, and mod- 
ified; the use of both forms and colors in- 
creased during reinforcement phases. More 
interesting, however, is the finding that in-f 
creased diversity in forms is more important 
than color diversity in leading judges to rate f 
drawings as more creative. In fact, drawings 
produced under reinforcement-for-color- 
only conditions were rated as significantly 
below baseline drawings in creativity. 

The present study illustrates the useful- 
ness of the social-validation procedure for 


havior are most likely to evoke a meaningful 
reaction in the social environment. Justas 
Ballard and Glynn (1975) and Maloney and 
Hopkins (1973) found that not all modifie d 
writing behaviors led to increased creativity 
ratings, the present study shows that in- 
creased diversity in some behaviors is more 
important than diversity in others. Itis als } 
clear that there is no need to limit this k y 
of analysis to relatively simple dimensio 

Such as form and color diversity. Othe 


presentational or real-life objects. Perhaps. 
with some effort, it might be possible to de- | 
velop a scoring procedure to assess reliably 
the degree to which objects in the draw” 


b 


} 

re interrelated or “thematically integrated.” 
The point is that a meaningful analysis of 
fairly complex artistic behaviors can proceed 
when a social-validation technique is em- 
ployed. 

In a fundamental sense the present study 
isan analysis of some of the conditions that 
lead people to use the term “creative” to 
describe an artistic product. As an approach 

the study of creativity, these procedures 

e quite different from the more traditional 

thods (Getzels & Jackson, 1962; Torrance, 

965). In this tradition, the task of the re- 
searcher is to examine the relation of crea- 
tivity, which has been defined on an a priori 
basis, to a host of intellectual and social 
variables; the assumption that the behaviors 
specified as creative are so regarded by the 
social community is typically not evaluated 
empirically. The social-validation proce- 
dure offers a means for making such an 

valuation. 

There is an additional possible benefit of 

cial validation. One of the traditional 
t enets of applied behavior analysis is that 
behavior modification procedures tell us how 
to change behavior but not what behaviors 
to change (Sulzer & Mayer, 1972). More- 
Over, the question of what behaviors should 
be changed, particularly in normal class- 
Tooms, has stirred a great deal of controversy 
(see Winett & Winkler, 1972). Social vali- 
dation provides information on how the 
community might respond to particular be- 
havioral objectives. This information could 
then be used as a basis for selecting behav- 
lors to be changed. Specifically, information 
on adult responses to dimensions of artwork 
E be used in the design of art curricu- 

However, there are two objections to using 

Social consensus to decide what is creative in 
ildren’s artwork. First, the notion of 
reativity would then be tied directly to the 
preferences of a particular culture or sub- 
Culture at a particular time. A culture- 
bound definition of creativity will not ad- 
Vance our understanding of the common el- 
ements in creative products across cultures 
and historical periods. 

Second, it may be undesirable to encour- 

age the view that creativity can be defined as 
a particular set of dimensions to be taught. 
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Such a view would lead to rigid recommen- 
dations on how art instruction should pro- 
ceed. These recommendations would not be 
objectionable if art education is conceived 
solely as a skill-building activity. However, 
if art education is viewed as an occasion for 
the child to engage in relatively uncon- 
strained expression, then a narrow concep- 
tion of creativity would be inconsistent with 
this goal. 

Thus, the use of social validation does not 
provide an easy way to select goals for art 
instruction. It can, however, be used to 
advance our understanding of the interac- 
tion between children’s creative efforts and 
the social environment. The study of how 
parents and teachers respond to children’s 
artwork will provide significant information 
on the role of social factors in encouraging 
and maintaining creative behavior. 
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Higher-Stratum Ability Structures on a Basis 
of Twenty Primary Abilities 


A. Ralph Hakstian 


University of British Columbia, Vancouver, Canada 


Raymond B. Cattell 


University of Hawaii 


Previous work has suggested that the 


well-known primary mental abilities 


(e.g., Verbal Ability and Spatial Ability) exhibit relationships among them- 


selves that can be understood in terms of higher-stratum, 


or more general, 


ability constructs. In the present study, 280 subjects were measured on 20 
primary abilities using the newly standardized Comprehensive Ability Bat- 
tery. A factor analysis produced six oblique second-stratum factors. Four of 
these were clearly identified as the capacities in Cattell’s triadic theory of 


ability structure: Fluid Intelligence, 


Capacity, and General 


factors yielded three oblique third-stratum factors. 
permitted integration with previous researc: 


e Crystallized Intelligence, Visualization 
Retrieval Capacity. The correlations among the six 


Results at both strata 
h and implications for a hierarchi- 


cal conceptualization of human abilities. 


Since Thurstone (1938) and Thurstone 
and Thurstone (1941) first established what 
were to become well-replicated primary 
mental abilities, and subsequent investi- 
oe (Adkins & Lyerly, 1952; Botzum, 

951; Rimoldi, 1951) began exploring the 
second-order domain in terms of these abil- 
ities, there has been progress in both the 
| empirical and theoretical definition of ability 

ee In the experimental studies of the 
ae 15 years (Cattell, 1963, 1967, 1971; 
een & Cattell, 1972, 1974; Horn, 1968; 

orn & Cattell, 1966), evidence has accu- 
mulated that the second-stratum factors are 
More numerous than a single “g” factor, such 
pos been generally accepted since the 
hs urstone-Spearman reconciliation. Ona 
ponder basis of primary ability factors than 
Re "E earlier studies, the general capacities 
E uid Intelligence (Gf), Crystallized In- 
G igence (Gc), General Cognitive Speed 
E s), Visualization Capacity (Gv), and 

eneral Retrieval Capacity (Gr) have been 
Ro) 1 — 
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recognized—with varying degrees of clarity 
of definition—from marker loadings in fac- 
tor analyses. 

In terms of theoretical development, the 
triadic theory of ability structure has been 
proposed (Cattell, 1971), to which the prin- 
cipal foil has been Guilford’s (1967) struc- 
ture of intellect model (see also Guilford & 
Hoepfner, 1971), in which a large number 
(120) of strictly uncorrelated, narrow pri- 
mary factors are hypothesized. The triadic 
theory postulates three classes of ability 
structures of varying breadth: (a) general 
capacities (Gf, Ge, etc.), which are conceived 
as parameters of total brain action; (b) pro- 
vincial powers, which are associated with the 


five sensory and motor-kinesthetic areas and 
of p;! and (c) 


are given the designation 
agencies, which are environmentally pro- 
duced developmental investments of the 


general capacities and provincial powers and 


are chiefly represented by the primary 
mental abilities. 

The positioning and identification of these 
abilities do not rest exclusively on their op- 


Miet su 

1 Because of the usual conventions of testing, only 
Visualization Power (Pv) has been well defined; but if 
the provincial powers class is subsequently verified, 


then Gv would more precisely be Pv. 


Association, Inc. (0022-0663/78/7005-0657800.75 
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erationally established level in higher-order 
factor analyses. The provincial powers, for 
example, are likely to appear as second- 
stratum factors along with the capacities. 
Ancillary evidence of various kinds is need- 
ed. In the present research, however, the 
choice of variables, for various reasons, sheds 
light on only the general capacities, with the 
solitary exception of the Visualization factor. 
The present study, then, was conducted to 
clarify the definition of these capacities and 
to suggest further structural leads with im- 
plications for the triadic theory. 


Method 


Ina previous study (Hakstian & Cattell, 1974), clear 
evidence was set out for 19 primary ability traits that 
were uniquely isolated from 57 test variables as oblique 
simple structure factors. In the present study, a 
twentieth primary factor was added. So broad a spec- 
irum of primary abilities carefully transformed to 
simple structure has not been available before; and 
except for the absence of performances through dif- 
ferent sensory channels, this availability presented an 
ideal opportunity to check higher-stratum structures. 


Primary Ability Variables 


The variables used in the first phase of the study were 
the 20 subtests from the Comprehensive Ability Battery 
(CAB; Hakstian & Cattell, 1975a). This is a new bat- 
tery of 20 primary ability tests designed to provide 
standardized measurement of the currently well-rep- 
licated ability factors and was developed over a period 
of. 3 years, with factorial purity of the tests being the 
primary consideration. "Thus, this battery differs from 
such earlier ability batteries as the Science Research 
Associates’ (SRA) Primary Mental Abilities Test 
(Thurstone & Thurstone, 1962) and the Differential 
Aptitude Tests (Bennett, Seashore, & Wesman, 1969) 
in that the latter two batteries measure only a small 
number of the currently known primary mental abili- 
ties. The philosophy underlying the construction and 

refinement of the CAB was to measure a broad. range of 
well-replicated primary abilities, necessitating for lo- 
gistical reasons short tests (5 to 6 minutes) of each 
ability, The ability tests of the CAB, which were the 
20 variables used as the basis of the present study, are 
described briefly in Table 1. Fora full account of the 
meanings of the factors represented by these tests, the 
reader is referred to two earlier articles by Hakstian and 
oo e puc The one factor not appearing in 
le study is Auditory Ability, whi i 
present Table 1. * cee 


Subjects 


Subjects in the study were 138 males and 142 fe 
in Grades 11 and 12, living in the greater nisi 
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Alberta, Canada, area. Six high schools participated 
in the study, of which two were urban-area schools in 
predominantly lower-middle-class districts, and four 
were located in suburban or semirural laboring and 
lower-middle-class areas. Subjects were predominantly 
Caucasian, with a small number (perhaps 596-1096) of 
Orientals and native Indians. Approximately three 
quarters of the subjects were in nonacademic high 
school programs, that is, programs leading not to uni- 
versity entrance but to job-related technical or com- 
mercial training. The remaining one quarter of the 
subjects were in university-entrance programs. The | 
male subjects ranged from 15 to 19 years old, with a 
mean age of 17.10 years (SD = .91). Females ranged 
from 15 to 19 years, with a mean age of 16.89 years (SD 
7/12). The study was conducted on entire classrooms 
during English periods, and the sexes were approxi- 
mately equally represented in the classes. Participation 
by subjects was not voluntary; teachers simply turned 
over their students and class time for the study. 


Administration and Scoring 


All testing was done in May 1973. The first 14 ability 
tests (Verbal Ability [V] through Esthetic Judgment [E] 
in Table 1) were administered in all cases in a single 
morning session at each of the six schools. A break of 
approximately 30 sec was allowed between each pair of 
tests, and a 20-minute rest period was given between 
Tests 7 (Flexibility of Closure, or Cf) and 8 (Associative 
Memory, or Ma). For some subjects, the remaining six 
tests were given the same afternoon; for the others, the 
remaining tests were given on another afternoon, 
ranging from 1 to 3 days after the morning testing ses 
sion. The afternoon tests were administered with 
breaks of approximately 30 sec between each pair. The 
total testing time, with instructions and breaks, was 
about 3% hours. Scores were obtained simply by taking 
the number of correct responses. No correction for 
chance was applied. 


Analysis Procedures 


Since significant sex differences in the means di 
several of the ability variables had been found in 4 
previous study (Hakstian & Cattell, 1975b), it would 
have been inappropriate simply to pool all 280 subjects 
and obtain correlations. Instead, the separate-seX c0- 
variance matrices were obtained (which removed mean 
differences) and were tested for homogeneity using 
Box's (1949) procedure. This test revealed that the 
Separate-sex covariance matrices came from popula- 
tions with equal dispersions, F(210, 235801) = 1.056, P 
> .25, that is, the pattern of variances and COVER aa 
among the 20 ability variables did not differ between the 
sexes. Thus, the two covariance matrices were poo 
were appropriately weighted, and were standardiz i 
yielding a pooled within-groups correlation m 

on all 280 observations. This correlation mat" 
is presented in Table 2. 8 

The factor analytic investigation involved two phase 
These are described separately below. was 

Phase 1. The correlation matrix in Table 2 
subjected to a principal-component analysis an 
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maximum-likelihood common-factor analysis. The 
purpose of the component analysis was to obtain the 
latent roots of the correlation matrix for use in deciding 
on the correct number of factors to retain. The Kai- 
ser-Guttman rule of retaining as many factors as there 
are latent roots exceeding 1.0 of the correlation matrix 
and Cattell's (1966) scree test on the latent roots both 
suggested that six factors be retained. Results of the 
likelihood ratio tests associated with the maximum- 
likelihood common-factor analyses yielded the following 
results: For four factors, x2(116) = 158.15, p < .01; for 
five factors, x2(100) = 113.04, p = .176; and for six fac- 
tors, x2(85) = 75.10, p =.77. From these latter results, 
it appeared that five factors might be adequate; but 
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because of the results of the other two tests and a desire 
to avoid underfactoring, we retained six factors. The 
six maximum-likelihood factors yielded residual cor- 
relations in which all but 2 of the 190 were less than .10 
in absolute value (.12 and .13) and 153 (8196) were less 
than .05, indicating completeness of factorization. The 
six-factor maximum-likelihood common-factor solution 
was then transformed to an oblique simple structure. 
The transformation was carried out by first obtaining 
an analytic oblique solution by the Harris-Kaiser 
(Harris & Kaiser, 1964) procedure, the A'A proportional 
to L solution being the best simple structure. This 
solution was substantially improved by application of 
the rotoplot method (Cattell & Foster, 1963) of visual 


Table 1 
psi oye > 3 
Primary Ability Variables Used in the Analyses (Subtests of the 
Comprehensive Ability Battery) 
Work 
time No. 
E Primary ability (minutes) items* Tue Content 
1. Verbal Ability (V) 5 20 .78 Vocabulary, proverbs 
2. Numerical Ability (N) 5 20 .19 Numerical computation 
3. Spatial Ability (S) 4l 72 .86 Two-dimensional figures 
4. Speed of Closure (Cs) 5 20 .71 Completing the gestalt (words) 
5. Perceptual Speed and Accuracy (P) 4h 72 .64 Evaluating symbol pairs 
6. Inductive Reasoning (I) 5% 12 74 Reasoning with letter sets 
7. Flexibility of Closure (Cf) 5 12 719 wipe ied NA MM 
& iativ 5! 14 .19 Memorizing design-nui 
Associative Memory (Ma) Y E E) 
i ili 72. Mechanical principles, tools, 
9. Mechanical Ability (Mk) 6 18 a irs gat 
10. Span Memory (Ms) 8 75 .96 Digit span (auditorily presented) 
i .84 Memorizing noun-descriptor 
ll. Meaningful Memory (Mm) 5Y, 20 nent EEA A 
i ifyi isspellings 
12. Spelling (Sp) 5 20 78 Identifying misspe 
13. Auditory Ability (AA) 10 56 .80 Pitch discrimination, e memory 
M. Esthetic Judgment (E) 9 36 10 Preferences for abstract. lesigns 
15. Spontaneous Flexibility (Fs) 5 NA 81 Multiple grouping piros 
16. Ideational Fluency (Fi) FUA NA .81 Listing attributes of nou 
V. W 4) NA 78 Anagrams í 
18. See d ede Ai 16 68 Object synthesis (constructing 
ginality (0) syntheses of two objects) ! 
imi 78 Making finely controlled pencil 
19. Aiming (A) 5 NA 78 prd 3 
20. Representational Drawing (RD) 6 NA .59 Drawing reproductions of presen! 


me time; I has a 6-minute work time; AA 

ime, 

Apes 

yin this column, NA means not applicable and indicates th 

ron fePorted reliabilities are for the final Comprehensive 

a ee normative sample. The reliability sample consis aA 

Sho. Reliabilities of tests V, I, Cf, Ma, Mk, Mm, Ms, Sp. 
Speeded tests N, S, Cs, and P, by the retest method; 
* reported reliabilities are the means of relial 


t the test is open ended. 
> Ability Battery tests, as assessed on a hom 


ted of 144 16-year-o! 


and those of Fs, Fi, W, A, 
abilities obtained separate! 


Lo enata Dae acu 
newest form of the Com- 


Note. Several of the subtests used in the present study have subsea! 


Prehensive Abili ifically, V is now 24 items long and 
ility Battery. Specifically, has 32 items with a 6-min: 


i vvised for the 
ently been aig unc N now has a 54-minute 


f xo x 
require Oi TE hs 26 items with a 6-minute work 


ute work time; and E has 


ogeneous sample arara 

d 136 16-year-old females from a single 
: Ee. the (corrected) split-half method; those 
A, and RD, by the use of separately timed parts. 
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formation. The percentage of primary-factor 

coefficients falling in the factor hyperplanes 
from 52% (Harris—Kaiser) to 55% (rotoplot) for the 
wide band (for the present sample size) of 0 + .10 
more importantly, from 28% (Harris—Kaiser) to 
é (rotoplot) for the more appropriate band width of 
05. After six cycles, each consisting of 15 single- 
shifts, a plateau in the hyperplane count was 
ed, and further transformation was deemed un- 


Phase 2. The 6 X 6 intercorrelation matrix of sec- 
‘ond-stratum factors produced at Phase 1 was subjected 
to the same factoring procedures as the original corre- 
lation matrix. Both the Kaiser-Guttman and scree 
tests suggested three third-stratum factors. Results of 
a maximum-likelihood common-factor analysis sug- 
gested that two factors were insufficient, x4) = 27.01, 
< 0001, and the test had no degrees of freedom for 
ee factors. Three factors were thus decided upon, 
dan unweighted least squares (or minimum residual) 
Immon-factor solution was obtained for three factors. 
lone of the 15 residual correlations exceeded .055 in 
lute value, and 11 of the 15 were less than .03, again 
icating an adequate factoring. This solution was 
formed to an oblique simple structure as before by 
analytic oblique transformation using the Harris- 
er (1964) procedure (this time the independent 
uster version being best) and, for further refinement, 
pug the rotoplot visual transformation method. After 
live rotoplot cycles, no improvement in the percentages 
of primary-factor pattern coefficients falling in the 
actor hyperplanes (the width of which was defined 
more broadly [to .13] because of the small number of 
Variables [6] and the increased probability of error inthe 
factored correlation matrix) had been realized over the 
"Initial Harris-Kaiser oblique solution (33%); however, 
excessively large negative loadings of —.17, —.28, and 
733 had been substantially reduced by the rotoplot 
E lifts, in accordance with the general characteristic 
found in ability factorings of a largely positive manifold. 
fa ally, to aid interpretation of the three third-stratum 
ee an oblique primary-factor pattern of the 
k -stratum factors projected directly onto the pri- 
o ability variables was obtained by the Cattell- 
White (Cattell, 1965) procedure. 


Results and Discussion 
Phase 1 


E can be seen from Table 2 that the gen- 
‘al characteristic, noted in previous wor 
p abilities, of a positive manifold among 
ce traits was clearly found in the present 
1 üdy. The only negative correlations (4 of 
taj : —.01, —.02, —.02, and —.05) were cer 
Ey close enough to zero to be considere 
arte error (the standard error of a 
(NK correlation coefficient, given a pop- 
is lon correlation of zero and 280 subjects, 
approximately .06). The final, optim 
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oblique simple structure solution for the 20 
primary ability variables is presented in 
Table 3 (this matrix is the primary-factor 
pattern matrix for the data). The inter- 
correlations among the six primary second- 
stratum factors are shown in Table 4. 

The interpretation of these factors, in 
terms of previous theoretical and empirical 
work, offers no difficulty, with the possible 
exception of Factor 4 (General Perceptual 
Speed). As hierarchical studies like the 
present one accumulate, such matching and 
identification must ultimately be done with 
congruence coefficients (Burt, 1940) or sa- 
lient variable similarity indices (Cattell, 
Balcar, Horn, & Nesselroade, 1969), but the 
rarity of second-stratum studies with a suf- 
ficient number of closely similar variables 
requires us to interpret the factors in terms 
of outstanding factor pattern coefficients 
(loadings). In most cases, therefore, we have 
labeled the obtained factors in accordance 
with earlier discovered patterns and theo- 
retical formulations, since this seems pref- 
erable when developing an emerging theory 
to proceeding de novo in each similarly de- 
signed study. 

Factor 1: Crystallized Intelligence (Gc). 
It seems clear that this factor is once again 
a manifestation of what has come to be 
known as Crystallized Intelligence, or Ge. 
The factor has appreciable loadings only 
from the abilities that are (of the 20 includ- 
ed) most purely the result of experiential, 
educative, and acculturation effects. Cat- 
tell’s (1971) triadic theory postulates just 
such a factor, which is shaped by the ac- 
culturation influences present in a society 
that harness or “crystallize” an individual 8 

functioning into pre- 


basic level of neural 
scribed and overlearned factual content and 


roblem-solving strategies. In Horn and 
Cattell's (1966) study, essentially the same 


1 

factor was loaded most highly by Verba 
ili ith loadings also from Mechanical 
pen al Reasoning (not 


Knowledge, Numerica’ 
RAG "he same ability as the present 
Numerical Ability because the latter involves 


omputation), and several other vari- 
mat ncladed in the present study. In 
an earlier study by us (Hakstian & Cattell, 
1972), using a heterogeneous sample, a sec- 
ond-stratum factor was found, with Verbal 
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Ability (V), Mechanical Ability (Mk) (as in 
the present study), and Originality (O) 
loading on it. The loading by O on the 
present factor is probably too low (.19) to be 
considered an important marker of Factor 
E 

Factor 2: Fluid Intelligence (Gf). This 
factor, in contrast to Factor 1, appears in the 
present study (as it has in past studies) to be 
loaded by variables in which drawing upon 
stored, crystallized skills brings no advan- 
tage. The factor is best understood as a 
basic “eduction-of-relationships” trait that 
represents most purely the physiological 
substrate of intelligence and that is “fluid” 
in the sense of being able to be directed into 
almost any intellectual problem. Probably 
the best single marker of Gf is the culturally 
reduced tests of inductive reasoning, such as 
the letter sets variety used in the present 
study or the figural reasoning tests (e.g., the 


Table 3 
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Culture Fair Intelligence Tests) used by 
Cattell and Horn in earlier studies. For this 
factor, the earlier marker variables have, in 
addition to the primary factor of Inductive 
Reasoning, been Spatial Ability (which is 
factorially complex), Numerical Ability (also 
complex in some earlier studies), and a low 
loading by Associative Memory (Cattell, 
1963; Horn & Cattell, 1966), to which the 
Hakstian and Cattell (1972) study added . 
Perceptual Speed and Accuracy. Thus, the 
general sense of Factor 2 here is the same as 
the previously discovered patterns for Gf. 
Factor 3: Visualization Capacity (Gv). 
It is clear from Table 3 that Factor 3 is in- 
volved with all the primary ability variables 
that require some form of visual organization 
or integration of stimulus materials and has 
little to do with those variables that do not. 
Thus, we see substantial loadings by the two ' 
visual-motor primaries of Aiming and | 


/ 


Oblique Primary-Factor Pattern Matrix at Optimal Position of Simple Structure E 


Second-stratum factor^ 


Gc Gf Gv Gr 
Primary ability D— 09 9 © o (6) h 
Verbal Ability (V) 59  -02  -o6 18 05 06 52 
Numerical Ability (N) -05 45 01 29 22^ -o €] 
Spatial Ability (S) 07 68 37 03 02 04 54 
Speed of Closure (Cs) 02 02 28 89 -33 -04 63 
Perceptual Speed and Accuracy (P) -25 40 27 37 -01 00 40 
Inductive Reasoning (1) 00 42 22 23 13 02 40 
Flexibility of Closure (Cf) DENM S n.» nu ^9 
Associative Memory (Ma) 03 06 —02 00 66 -03 46 
Mechanical Ability (Mk) 57 17 X 5 -» o 40 
Span Memory (Ms) -13 05 00 31 11 22 2 
Meaningful Memory (Mm) 12 03 01 06 38 06 25 
Spelling (Sp) 10  -i8 03 63 5 -04 42 
Auditory Ability (AA) 17 00 24 5 11 18 
Esthetic Judgment (E) 09 01 06 x: 2n 32 1 4 
Spontaneous Flexibility (Fs) 13 24 23 pe " 7 pd 
Ideational Fluency (Fi) 01 28 0 z 4 1 70 
Word Fluency (W) 07 09 ; Hs * 40 
Originality (0) 19 02 A i ay fa 2 
Aiming (A) : 00 —06 44 S a —01 26 
Representational Drawing (RD) —04 -07 42 07 17 03 21 
d jer actos aes el ton omitted. Salient factor pattern coefficients, or loadings used to interpret 
Capacity, Gps = General RES isi = Crystallized Intelligence, Gf = Fluid Intelligence, Gv = Visualiz a- 
pacity. peed, Gm = General Memory Capacity, and Gr = General Retriev? 
a Numbers in parentheses are factor numbers. 


lable 4 
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rcorrelations Among the Second-Stratum Factors 
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Second-stratum factor 1 2 3 4 5 6 
Crystallized Intelligence (Gc) 100 
Fluid Intelligence (Gf) 17 100 
Visualization Capacity (Gv) -12 -21 100 
General Perceptual Speed (Gps) 35 36 -01 100 
General Memory Capacity (Gm) 31 17 18 53 100 
General Retrieval Capacity (Gr) 30 —05 02 30 22 100 


Note. Decimal points have been omitted. 


Representational Drawing as well as by 
Spatial Ability; and we see salient, but 

mewhat lower, loadings by the three per- 
-ceptual ability primaries of Speed of Closure, 
Flexibility of Closure, and Perceptual Speed 
and Accuracy. The loading by Mechanical 
Ability is understandable, given the amount 
of visualization required by many items of 
this test (involving gears, pulleys, etc.). 
Thus, the meaning of this factor appears 
clear and corresponds to that of a similar 
factor found in earlier research using some 
of the same variables as in the present study 
(see Horn & Cattell, 1966; Cattell, 1971, pp. 
106-107). The loading by Representational 
Drawing, which is a variable not included in 
earlier research, was present in the same 
E" found in Hakstian and Cattell's (1972) 

y. 

We postpone discussion of Factor 4 until 
after the next two second-stratum factors. 
po 5: General Memory Capacity 
te m). From the results in Table 3, this 

ctor would have to be considered a General 
Boy Capacity, although it was not suf- 

ently well replicated by 1971 and thus 

Oes not appear as a solidly established sec- 
stratum factor in Cattell’s (1971) book. 

rtainly, the earlier work of Kelley (1964) 
had suggested several memory abilities at 
i. primary ability level concerned with 
etting material to memory, and Hak- 

lan and Cattell (1974) demonstrated the 

iscriminability of Associative Memory 
M ), Meaningful Memory (Mm), and Span 
lemory (Ms) used in the present study. In 
i ir 1974 study, Hakstian and Cattell ar- 
aN for a conceptualization of Ms not gen- 
Mm y consistent with memorizing ability but 
re a matter of immediate receptivity, at- 


tentiveness, or simply keeping a range of 
stimuli in immediate awareness. Hakstian 
and Cattell (1972) found a factor similar to 
the present Factor 5, being loaded as here by 
the Associative and Meaningful Memory 
abilities, which are two abilities that differ 
primarily in the meaningfulness of the 
long-term association formed. The absence 
of any loading by Ms on this factor, along 
with more cognitive-process interpretations 
of the cognitive activity performed, supports 
this view. Thus, Gm should likely be re- 
garded as a goodness-of-retention factor, 
with the discriminability of Ma and Mm 
found at the first stratum being largely a 
function of the method of committing the 
material to memory (see, e.g., Cattell, 1971, 
p. 42). In traditional terms, this factor 
would have to be considered long term. A 
long-term commitment to memory of various 
numerical results could be expected to ac- 
count for the loading on Gm of the Numeri- 
cal Ability variable. ‘ 
Factor 6: General Retrieval Capacity 
(Gr). The interpretation of retrieval for this 
factor stems from the notion that what is 
shared by the salient variables for this factor, 
particularly Ideational Fluency (Fi), Origi- 
nality (O), and Spontaneous Flexibility (Fs), 
is a capacity for the retrieval of concepts or 
items from long-term memory storage (see 
Cattell, 1971, p. 108). The factor thus differs 
from Factor 5 (Gm) in the sense that the 
committing-to-memory-storage component 
is of little importance, whereas the capacity 
to call up ideas rapidly and in quantity 1s the 
central feature. This factor has been found 
in various forms in past research (Bernstein, 
1924; Cattell, 1971, pp. 106-108; Horn & 
Cattell, 1966) to be loaded by fluency-type 
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Table 7 A 
Intercorrelations Among Third-Stratum 
Factors 


Third-stratum factor 1 2 3 


1. Original Fluid 100 
Intelligence (a) 

2. Capacity to —33 100 
Concentrate (8) 

3. School Culture 22 20 100 


(y) 


Note. Decimal points have been omitted. 


can be considered very reliable because of 
the absence of sufficient hyperplane an- 
choring variables. A tentative result must 


. be stated, therefore, and left to the further 


accumulation of findings. Following the 
convention found in several factor analytic 
sources, the present authors have used Greek 
letters for the third-stratum traits. 
Factor o: Original Fluid Intelligence. 
Original (or “Historical” in Cattell, 1971) 
Fluid Intelligence fits the theory summa- 
tized above and earlier empirical results 
(Cattell, 1963, 1967), being loaded by 
Present Fluid Intelligence most highly and 
Crystallized Intelligence to a lesser degree. 
Presumably, as part of general cortical effi- 
ciency, this factor also has considerable in- 
fluence on the General Memory capacity 
(Gm) and on the general Perceptual Speed 
(or Speed of Closure) factor, although the 
Perceptual Speed factor can be seen to be 
influenced to a lesser extent by the School 
Culture factor (y) and the second factor. 
To further elucidate the third-stratum 
factors, their projections directly onto the 
primary ability variables are presented in 
Table 6 using the Cattell- White (Cattell, 
1965) procedure. Characteristically, such 
a matrix does not exhibit a clear simple 
structure if the strata are truly hierarchical, 
although occasionally one may do so. Also, 
the pattern coefficients, or loadings, tend to 
be of only moderate magnitude. Regarding 
Factor a, or Original Fluid Intelligence, its, 
breadth of ultimate effect, leaving unin- 
fluenced only such primary abilities as 
Aiming, Representational Drawing, Esthetic 
Judgment, Originality, Ideational Fluency, 
Auditory Ability, and so on, can be seen from 
the first column of this matrix. 
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Factor y: School Culture. This factor, 
as is seen from Tables 5 and 6, can be rea- 
sonably hypothesized to represent the effects 
of acculturation primarily resulting from 
formal schooling—a hypothesis that could 
be tested by correlating scores on this factor 
among, say, 18 year olds, with number of 
years and achievement level in school. 
Crystallized Intelligence and the General 


Retrieval Capacity are the major markers, - 


and Present Fluid Intelligence takes no 
contribution whatever from it. The reser- 
voir theory of retrieval capacity (Cattell, 
1971) contends that the number of words, 
ideas, and so on recalled in a stated time is 
partly a function of some accessibility power 
but largely of the sheer volume contained in 
the reservoir. School contributes to this, 
regardless of intelligence, to the extent that 
it can be seen as a part of Jensen’s (1969) 
Level 2 abilities constellation. In the Cat- 
tell-White solution in Table 6, Factor y 
(School Culture) gets its highest loadings 
from Verbal Ability, Ideational Fluency, 
Originality, Word Fluency, Spelling, Es- 
thetic Judgment, and Spontaneous Flexi- 
bility, which fits the earlier hypothesis rea- 
sonably well. 

Factor 8: Capacity to Concentrate. The 


third-stratum factor for which only quite ; 


speculative and doubtful hypotheses can be 
given at present is Factor 8. From the ab- 
sence of Fluid and Crystallized Intelligence 
and General Retrieval Capacity as markers, 
it would seem to be a product of neither that 
form of cortical neural mass and functional 
efficiency that constitutes the basis of in- 
telligence nor of experience. A tentative 
label would be “Capacity to Concentrate” or 
perhaps, “Cortical Alertness,” which 18 
somewhat akin to the personality trait 
“Cortertia” (Cattell, 1957, 1973; see also 
Hakstian & Cattell, 1978, in which certain 
personality factors have been shown to cas 
their influence onto the abilities done 
inasmuch as this factor affects the capacity 
to commit material to memory, the power » 
capacity to visualize effectively, and he 
pacity for perceptual speed (to a lesser 
gree). of 
À further examination of the € 

Factor B, using the Cattell- White soluti i- 
reveals that what may be involved, in 4 


| 


| 


| 
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| tion to the Capacity to Concentrate (as we 
have tentatively labeled this factor), is a 
certain degree of effort, motivation, exertion, 
or perseverance. If one notes the primary 
ability variables that load on Factor £, that 
is, Numerical Ability, Speed of Closure, 
Perceptual Speed and Accuracy, Inductive 
Reasoning, Associative Memory, Spelling, 


| Aiming, and Representational Drawing, it 


can be seen that all of these primary ability 


| tests—with the possible exception of Spell- 


ing—require an active and motivated in- 
volvement in the task if high scores are to be 
obtained (unlike, e.g., the Verbal and Me- 
chanical Ability tests, in which the subject 
either does or does not recognize the correct 
answer). For some of these abilities, a cer- 
tain level of functional intelligence is also 
required, and thus, the primary abilities of 
Numerical Ability, Speed of Closure, Per- 
ceptual Speed and Accuracy, Inductive 


| Reasoning, Associative Memory, and Spell- 
| ng load on Factor a to an even greater ex- 
tent, and Spelling also loads on Factor y 


(School Culture), as might be expected. 
Significantly, the two primary abilities that 
require little functional intelligence, in the 
ordinary sense of the term, that is, Aiming 
and Representational Drawing, have large 


_ and sole loadings on Factor f. 


Summary and Integration 


l. Two main theoretical systems con- 


| terning the structure of human abilities are 
currently opposed: (a) that of Guilford 


(1967), hypothesizing numerous orthogonal 
Primary factors; and (b) that of Cattell 
(1971), postulating a series of ability 


| strata—stated in fullest detail in the triadic 
theory—involving primary agencies, pro- 


vincial powers, and general capacities. 3 
2. Previous empirical work by Hakstian 
and Cattell (see Hakstian & Cattell, 1974, as 


| example) has shown that the optimization 


of simple structure has placed some 20 pri- 


_ Mary abilities (agencies) definitely in mu- 


tually oblique positions. The present study 
Was designed to extend our understanding 
* these relationships by exploring higher- 


| Stratum structures. 


«S. Th a sample of 280 subjects, 6 common 
actors emerged from the 20 primary abili- 
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ties; after careful oblique transformation of 
these factors by both analytic and visual 
methods, a clear simple structure resulted. 
These factors were interpreted as the fol- 
lowing higher-stratum capacities: 

a. Crystallized Intelligence (Gc) had 
loadings predominantly from the culture- 
related variables of Verbal and Mechanical 
Abilities. 

b. Fluid Intelligence (Gf) had marker 
loadings from the generally nonacculturated 
Spatial Ability, Inductive Reasoning, and 
Perceptual Speed and Accuracy variables 
and also from the Numerical Ability pri- 
mary, which involves mainly speedy com- 
putation. 

c. Visualization Capacity (Gv) had major 
loadings from Spatial and Mechanical 
Abilities, Aiming, and Representational 
Drawing, and lesser but salient loadings from 
the perceptual abilities of Speed of Closure, 
Flexibility of Closure, and Perceptual Speed’ 
and Accuracy—all of which involve imagin- 
ing reorientations of objects in space, keep- 
ing a certain visual orientation fixed, com- 
pleting the gestalt, and so on. 

d. General Perceptual Speed (Gps) had 
marker loadings from the Speed of Closure 
ability, Perceptual Speed and Accuracy, 
Spelling, and Word Fluency (the latter as- 
sessed in the CAB by an anagrams-type 
test). The main loadings here appear to 
favor a Perceptual Speed or Speed of Closure 
interpretation, rather than plain speed; but 
if so, a broad higher-stratum capacity found 
previously (Cattell, 1971; Horn & Cattell, 
1966) has been missed. 

e. Memory Capacity (Gm), or the ca- 
pacity for committing material to long-term 
memory, loaded saliently by the Associative 
and Meaningful Memory abilities and toa 
lesser extent by Numerical Ability. 

f, General Retrieval Capacity (Gr) rep- 
resents the retrieval-from-memory-storage 
capacity, with marker loadings from Idea- 
tional Fluency, Originality, Spontaneous 
Flexibility, and Esthetic Judgment. The 
fact that the Word Fluency (W) ability does 
not load on it, as might be expected, is 
probably best explained by the anagram 
nature of the W test, which likely places as 
much emphasis on perceptual closure as on 


retrieval of words. 
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4, The second-stratum factor intercor- 
relations were factored, and the three re- 
sulting third-stratum factors were trans- 
formed both analytically and visually to 
simple structure positions. These third- 
stratum factors were interpreted as fol- 
lows: 

a. Original Fluid Intelligence (Factor o) 
was the largest of the third-stratum factors 
and represents that Fluid Intelligence which 
persists as Present Fluid Intelligence (Gf) 
but which also was invested developmentally 
in Crystallized Intelligence (see Cattell, 1971, 
for a full account of the investment theory). 
In the present study, this factor appears 
consistent with this explanation, but the 
factor also seems to have had an even greater 

influence on the memory and perceptual 
speed capacities. 

b. Capacity to Concentrate (Factor 8) 
casts its influence primarily on the visual- 
ization and memory capacities. This factor 
is certainly the most tentative of the third- 
stratum traits. 

€. School Culture (Factor y) represents 
the accumulated effects of various accultu- 
ration opportunities, most notably formal 
schooling. This factor, as might be expect- 
ed, has its greatest influence on the Crys- 
tallized Intelligence and General Retrieval 
Capacities and, to a lesser extent, on Per- 
ceptual Speed or Speed of Closure. 

Because this stratum has been so rarely 
explored (see Cattell, 1971, p. 116), with only 
some suggestion of what we have termed 
Original Fluid Intelligence, and because the 
small size of the correlation matrix factored 
leaves some doubt regarding transformation, 
it is probably not worthwhile to attempt to 

discuss further theory until more such 
studies have accumulated. 

5. With the exception of one factor, the 
second-stratum factors agree in meanin 
with those previously found (Cattell, 1963, 
1967; Horn & Cattell, 1966) and with the 
theoretical formulations of Cattell (1971); 
and it is concluded that a second- and 
third-stratum formation, as required by 
the triadic theory, is supported. However, 
the only pattern yet located here corre- 
sponding to a sensory power (a P in the 
triadic theory) is Visualization Capacity. 
Since only one auditory primary ability was 
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included and none in any other sensory area 
besides the visual, this fact does not disprove 
the hypothesis of-higher-stratum factors in 
addition to the general capacities, factors 
corresponding to “provincial powers” in 
sensory domains. Obviously, the replica- 
bility of several higher-stratum structures is 
incompatible with the orthogonal single- 
stratum structure in the Guilford system. 
More constructively, however, it points to 
consistent patterns for these higher-stratum 
structures and invites further interpretive 
research. 
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Effects of Desegregation on Race Relations and Self-Esteem 


Walter G. Stephan 
New Mexico State University 
This study examined the effects of deseg 
interethnic contact, and self-esteem of bl: 
The pre-desegregation sample consisted 


David Rosenfield 
Southern Methodist University 
regation on the interethnic attitudes, 
lacks, whites and Mexican Americans. 
of fifth- and sixth-grade students at- 


i i . The post- 
tending segregated and naturally integrated elementary schools 
desonrenition sample was comprised of sixth-grade students from both types 
of background who were attending desegregated schools. : The attitude and 
contact data demonstrated that all three groups were highly ethnocentric. 


The results also indicated that white 


and black students from segregated 


backgrounds had more negative attitudes toward both out-group and in-group 
members after desegregation than before. Desegregation had little effect on 


interethnic contact or self-esteem. 


School desegregation and its effects con- 
tinue to be one of this country’s most con- 
troversial social issues more than 20 years 
after the Supreme Court decision that 
mandated it. Recent reviews of the litera- 
ture in this area suggest that the effects of 
desegregation on prejudice, interethnic 
contact, and self-esteem are complex: 
Sometimes the changes are positive, and 
sometimes they are negative; but most often, 
there are no changes, or the effects are mixed 
(Armor, 1972; Epps, 1975; St. John, 1975; 
Stephan, 1978; Weinberg, 1970). In the 
brief review of the literature that follows, the 
effects of desegregation on prejudice will be 
presented first, followed by reviews of the 
research on interethnic contact and self- 
esteem. 

There have been seven published studies 
of the effects of desegregation on prejudice. 
Only one of these studies found that deseg- 
regation reduced prejudice (Webster, 1961); 
and in this study, a reduction in prejudice 
was found only for blacks’ attitudes toward 
whites. Three studies have obtained pre- 
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dominantly negative results (Armor, 1972; 
Dentler & Elkins, 1967; Green & Gerard, 
1974). There are also three studies in which 
no differences were found between segre- 
gated and desegregated schools (Horowitz, 
1952; Silverman & Shaw, 1973; Williams, 
Best, & Boswell, 1975). Thus, it appears 
that the effects of desegregation on prejudice 
are rarely positive and sometimes nega- 
tive. 

Three types of quasi-experimental designs 
have been used in these studies: cross-sec- 
tional, longitudinal, and factorial. Cross- 
sectional studies are often done in school 
districts where natural integration has oc- 
curred. For this reason, differences between 
Segregated and desegregated students may 
not be due to school desegregation but toa 
number of other factors, such as social class. 
Longitudinal designs generally have been 
employed when a school district initiates 4 
desegregation program. The major prob- 
lems with longitudinal designs involve nd 
tory and maturation effects (Campbell: 1 
Stanley, 1963). Studies employing factorial 
designs have typically been done in schoo 
districts in which some type of partial de- 
segregation plan has been implemente! i 
The major advantage provided by factoria 
designs is the availability of a segregate 
control group. i dby 

In addition to the restrictions impose? 9Y 
the different designs that have been a 
ployed, there are other differences d : 
these studies that limit the conclusions t 
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can be made. Characteristics of the com- 
munities, such as degree of residential seg- 
regation, ratio of minority students, the age 
of the students, size of community, region, 
degree of opposition to desegregation, wi- 
thin-school segregation (tracking), and de- 
segregation of teachers, vary considerably 
across studies. Because of the method- 
ological weaknesses and the many differ- 
"ences among these studies, it is impossible 
to draw firm conclusions concerning the ef- 
| fects of desegregation on prejudice. 

The two studies that have examined the 
effect of desegregation on interethnic contact 
found that desegregation had little or no ef- 
fect (Gottlieb & ten Houten, 1965; Silverman 
& Shaw, 1973). In addition to these studies 
of actual contact, St. John (1975) lists 19 
published and unpublished studies that have 
examined the effects of desegregation on 
‘sociometric choices. For both blacks and 
Whites, a majority of the sociometric studies 
find no effects or mixed effects (62% for 
"whites and 64% for blacks). A smaller 
number of studies find negative effects (19% 
for whites and 29% for blacks) or positive 
effects (19% for whites and 7% for blacks). 
These studies are subject to many of the 
same criticisms that apply to the studies of 
Prejudice. However, there are additional 
roblems as well. For example, St. John 
(1975) points out that controlling for racial 
Percentages often eliminates the appearance 
of an increase in sociometric choices of out- 
Eroup members. A number of studies, 
however, have not employed this control. 

There have been eight studies of the ef- 
fects of desegregation on black self-esteem. 

ee of these studies found that desegre- 
gation had negative effects on self-esteem 
(Bachman, 1970; Coleman et al., 1966; 
Powell & Fuller, 1970), and five found that 
desegregation had no effects on black self- 
@steem (Gerard & Miller, 1975; Knight, 
ite, & Taff, 1972; Rosenberg & Simmons, 
1971; St. John, 1971; Williams & Byars, 
1968). Based on these studies, it appears 
at desegregation sometimes has negative 
Effects on black self-esteem and never has 
Positive effects. Several of these studies also 
Examined the effects of desegregation on the 
‘Self-esteem of white students. Two studies 
found that desegregation had no effect on 
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whites’ self-esteem (Gerard & Miller, 1975; 
St. John, 1971), and one suggests that de- 
segregation may be associated with increases 
in whites’ self-esteem (Coleman et al., 
1966). 

A general criticism that applies to many 
of the studies reviewed above is that they 
only examined one effect of desegregation, 
or they failed to examine the interrelation- 
ships among the variables included. The 
present study was designed to examine not 
only the effects of desegregation on preju- 
dice, interethnic contact, and self-esteem but 
also to examine the interrelationships among 
these variables. The predesegregation 
subjects were fifth- and sixth-grade students 
from segregated and naturally integrated 
elementary schools. Students from these 
same neighborhoods were then surveyed 8 
months after desegregation. The students 
from naturally integrated neighborhood 
schools constitute a unique control group in 
this study. These children had frequent 
opportunities for interethnic contact both 
before and after desegregation. ‘Thus, these 
students provide a control for history and 
maturation effects. Another important as- 
pect of the present study is that the school 
system that was investigated has a triethnic 
student population (blacks, whites, and 
Mexican Americans), which makes it possi- 
ble to examine the effects of desegregation 
on attitudes and behavior toward two out- 


groups. 


Method 


gated as a consequence of court orders in 1971 
and 1973. Students in Grades 1-5 attended neigh- 


i pulation for the con 
dy consisted of all of the students attending the fi 
an sixth grades of five segregated elementary schools 
(the mean perce! 
was 94%) an 
schools (less tl 
ethnic group). 
schools in term 
participated in the 
permission of his or h 
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was gathered 2 years before desegregation began. The 
post-desegregation population consisted of students 
from the same neighborhoods 8 months after a court- 
ordered desegregation plan for the sixth grade was im- 
plemented. To obtain the post-desegregation samples, 
all students attending the nine sixth-grade centers who 
had previously attended one of the elementary schools 
studied in the pre-desegregation phase of the study were 
given parental permission forms. This procedure was 
followed in order to make the pre- and post-desegre- 
gation samples as equivalent as possible. Overall, a 
relatively low percentage (8%) of parents refused to 
allow their children to participate in the study. 
The basic design was a 2 X 2 X 3 factorial. The first 
factor concerns whether the students were surveyed 
before or after desegregation, the second factor concerns 
whether the students came from neighborhoods where 
the elementary schools were segragated or naturally 
integrated, and the third factor consists of the three 
ethnic groups. The total sample was comprised of 309 
blacks, 487 Mexican Americans, and 528 whites. The 
students completed the questionnaire in a group setting. 
White administrators explained and illustrated the 
response formats and read each question to the stu- 
dents, Students in the integrated and desegregated 
classrooms were separated by ethnic group for the ad- 
ministration of the questionnaire. The purpose of 
separating the students was to obtain evaluations of 
ethnic outgroups that were as free as possible from the 
biases that could potentially occur in a multiethnic 
group administration setting where the students could 
observe each other's responses. 
The questionnaire contained measures designed to 
assess interethnic attitudes, interethnic contact, and 
self-esteem. Interethnic attitudes were measured with 
10-item scales composed of bipolar trait dimensions set. 
up in a 9-point semantic differential format. The 10 
bipolar dimensions were the following: friendly versus 
unfriendly, emotional versus unemotional, trustworthy 
versus untrustworthy, religious versus not religious, 
athletic versus unathletic, selfish versus unselfish, 
similar to me versus dissimilar to me, attractive versus 
unattractive, easy to understand versus difficult to 
understand, industrious versus unindustrious. For 
each dimension, the administrator asked, "What would 
you say most — [ethnic group] are like? Are they 
— [one end of the continuum], or — [the opposite end 
of the continuum], or somewhere in between?" After 
completing their evaluation of all three groups, the 
students were asked to indicate which pole of the bi- 
polar continuum they considered it better for a person 
to be. These ratings were used to scale the items from 
1 (positive) to 9 (negative). The sum across the 10 items 
was used as an index of how negatively the student 
evaluated the members of each group. The mean in- 
p consistency (Cronbach’s alpha) of this scale was 


Interethnic contact was measured with a 9-item scale, 
The items selected for this scale were designed to Sons 
voluntary informal contact. The questions asked the 
subjects to indicate how much contact (frequently, 
sometimes, or never) they had had with members of 
each of the three ethnic groups in the following situa- 
tions: "have talked to," "played a game with at a 
playground,” “swam in the same swimming pool," “been 
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to a party where they were,” “been to their house tg 
visit," "been to a store where they were shoppi 
“been to a movie theater where they were,” “called 
and talked on the telephone,” and “have brought home 
after school to play." The contact scores were deri! 
by assigning a score of 1 to responses of never, a 2 
responses of sometimes, and a 3 to responses of fre- 
quently. The sum across all nine situations were used 
as an index of the subject’s contact with each group, 
‘The mean internal consistency for this scale was .82; — 
The self-esteem measure was composed of 7 i r 
The items were the following: “I would say that Ta 
happy”; “I wish that I were different from the way 
am”; “When I meet someone new, I usually think " 
he is better than I am”; “Other people often cri 
me”; “I feel comfortable talking with my teacher"; 
like the way I look”; “I can’t seem to get other people 
notice me.” The response format consisted of a 5- 
Likert scale (reverse coding where appropriate). ] 
sum across the 7 items was used as the measure of self 
esteem. This measure had a mean internal consis 
of 46. On a separate sample of fifth-grade studeni 
this measure was found to correlate significantly 
the scale developed by Coopersmith (1967). The 
relation was r = .75, p < .01 (n = 38) for the whites; Bel) 
-50, p <.01 (n = 48) for the blacks; and r = .32, p <i 
(n = 54) for the Mexican Americans. 


Results 


Interethnic Attitudes 


To facilitate the presentation of the 
sults, the data from each ethnic group W 
analyzed separately. The data for eae 


Were pre- versus post-desegregation am 
whether the students had attended s 
gated or integrated elementary schools. 
within factor consisted of the three e 
groups being evaluated. 

The results for the white students 
cated that there was a main effect for eth 
group being evaluated, F(2, 1008) = 68.03) 
<.001 (see Table1). Whites evaluated theif 
own group more 


contrasts indicated that whites from seg 
gated backgrounds developed more negi 
attitudes toward all ethnic groups 

segregation, F(1, 504) = 11.75, p < 
whereas those from integrated backgr E 
did not change their attitudes as a resuy™ | 
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desegregation (F < 1). Although whites 
from segregated backgrounds did show a 
significant negative change in attitudes 
toward the out-groups during desegregation, 
F(1, 504) = 8.42, p <.01, contrasts show that 
this increase was not greater than the nega- 
tive change in attitudes toward their own 
group (F <1). After desegregation, whites 
from both types of backgrounds had similar 
- attitudes toward out-group members. 

For the blacks, there was a significant 
ethnic group main effect, F(2, 552) = 53.97, 
p <.001 (see Table 1). As was the case with 
the whites, blacks evaluated their own group 
most positively. Again, like the whites, the 
blacks also showed a tendency toward a 


Table 1 
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Type of Background X Pre/Post-Desegre- 
gation interaction, but this tendency was 
only marginally significant, F(1, 276) = 2.77, 
p € 10. Contrasts indicated that blacks 
from segregated backgrounds developed 
more negative attitudes toward all the ethnic 
groups after desegregation, F(1, 276) = 7.46, 
p < .02; whereas blacks from integrated 
backgrounds did not (F < 1). Again, the 
segregated blacks’ attitudes toward the 
out-groups did become more negative, F(1, 
276) = 4.09, p < .05, but no more so than 
their attitudes toward the in-group (F < 
1). 

The analysis also revealed a significant 
Type of Background X Pre/Post-Desegre- 


- Interethnic Attitudes 
Pre-desegregation attitude toward Post-desegregation attitude toward 


Mexican Mexican 
Background Blacks Americans Whites Blacks Americans Whites 
White students 
Segregated 
M 40.8 40.7 35.0 45.5 43.6 384 
SD 9.6 9.1 8.7 10.2 9.2 7.5 
Cell 
frequencies 141 141 141 67 67 67 
Integrati 
Mo s 43.8 42.9 39.6 45.2 42.7 37.8 
SD 12.2 10.9 14.1 11.3 10.6 8.5 
Cell 
frequencies 151 151 151 149 149 149 
Black students 
Segregated 
M mR 33.6 47.8 37.5 40.6 49.7 To 
SD 10.7 15.5 12.8 13.5 16.7 ‘ 
Cell » 
frequencies 111 111 111 28 28 
Integrated 2i s 
M 36. 44.4 424 32.6 . 7 
SD 169 13.1 159 11.4 15.7 16.0 
Cell 
| 45 45. 
frequencies 96 96 96 45 
Mexican American students 
8 
M ue 45.7 35.5 42.7 42.9 5 9 40. 1 
2D 14.1 11.5 13.4 119 . j 
1 12 112 
frequencies 146 146 146 112 1 
I 
OE 42.6 32.9 41.3 434 » » 3 
d 15.1 117 15.1 E to. 
Cell frequencies 64 64 64 
Note, For means, higher numbers indicate more negative attitudes. 
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gation X Ethnic Group interaction, F(2, 552) 
= 3.67, p <.03. This interaction seems to be 
mainly due to the fact that blacks from se- 
gregated backgrounds became more negative 
toward all ethnic groups after desegregation, 
while blacks from integrated backgrounds 
showed negative changes only toward out- 
group members. The attitudes of blacks 
from integrated backgrounds toward mem- 
bers of the in-group actually became slightly 
more positive after desegregation. 

Mexican Americans displayed the same 
ethnic group main effect shown by the other 
two groups, F(2, 796) = 68.54, p < .01 (see 
Table 1). Again, their attitudes were much 
more positive toward their own group than 
toward the out-groups. No other effects 
were significant for this group. 


Interethnic Contact 


Each ethnic group had more contact with 
in-group members than with out-group 
members; F(2,1008) = 842.15, p < .001, for 
whites; F(2, 552) = 358.22, p < .001, for 
blacks; and F(2, 796) = 636.73, p < .001, for 
Mexican Americans (see Table 2). For 
whites, there was significant. Type of Back- 
ground X Pre/Post-Desegregation interac- 
tion, F(1, 504) = 5.54, p « .02. Contrasts 
indicated that whites from segregated 
backgrounds did not change their levels of 
contact as a result of desegregation (F < 1); 
whereas whites from integrated ds 
had more contact with all groups after de- 
on than before, F(1, 504) = 4.64, p 


There was also a Type of Background x 
Pre/Post-Desegregation X Ethnic Group 
interaction for blacks, F(2, 552) = 3.12, p< 
-05. After desegregation, blacks from both 
types of background showed no change in 
contact with whites and a slight increase in 
contact with Mexican Americans. But, 
while blacks from integrated backgrounds 
tended to increase in contact with blacks 
after desegregation, blacks from segregated 
Go proda tended to decrease in in-group 
contact. 


Self-Esteem 


The data for self-esteem were analyzed b: 
a2X 2X 3, three-between analysis of fati. 
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ance. Since all three ethnic groups y 
included in this analysis, ethnic group w 
between-subjects factor. This analysis. 
yielded an ethnic group main effect, P(2, 
1312) = 5.49, p < .01 (see Table 3), Fol. 
low-up contrasts indicated that blacks ant 
whites did not differ in self-esteem (F <1 
but both these groups had higher self-esi 
than Mexican Americans: F(1, 1312) =9, 
p <.01, for blacks; and F(1, 1312) = 5.66, p 
< .05, for whites. There was also a signifi- 
cant interaction between background, pre/. 
post-desegregation, and ethnic group, Fl 
1312) = 3.16, p < .05. Contrasts revea 
that while desegregation did not have a 
ferential effect on segregated versus inte 
grated whites or Mexican Americans, Fl 
1312) — 1.99, ns, and F « 1, respectively, 
did differentially affect segregated and: 
tegrated blacks, F(1, 1312) = 5.19, p < 
Although the magnitude of the effect is 
large, the blacks from segregated bai 
grounds decreased in self-esteem as a re 
of desegregation, while blacks from inte 
grated backgrounds increased. 


Correlational Analyses 


For the correlational analyses the data fo 
each ethnic group were collapsed acros 
pre/post-desegregation and segregated! 
integrated background. The correlati 
between the students’ attitudes toward tht 
two ethnic out-groups were high for all thre 
groups: r=.51,p < .001, for the whites; 
53, p < .001, for blacks; and r = .42, p < 40 
for Mexican Americans. Further, the c 
relations between each group's evaluationo 
the in-group and its evaluation of the twe 
out-groups were all positive and signifi 
(range of .22 to .42; M = .32). In every ¢ 
the correlations between the in-group &! 
out-group attitudes were significantly 1685 
Positive than the correlation between the 
attitudes toward the two out-groups (ps $ 
-05, using Fisher’s z test). These dat 
suggest that the tendency to have con : 
attitudes toward out-groups is stronger th 
the tendency to maintain consistency be- 
tween in-group and out-group attitudes. 

A parallel effect was observed for ^ 
contact measures. The correlations beti 
the amount of contact with the two € ie 
out-groups were positive for each group: 


SD 


37, p < .001, for whites; r = .37, p < .001, 
r blacks; and r = .39, p < .001, for Mexican 
mericans. The correlations between in- 
toup and out-group contact were all sig- 
ificantly lower (range of —.05 to .14; M = 
07; all ps < .01, using Fisher's z test). 
There was also a substantial degree of 
nsistency between attitudes toward out- 
ups and contact with out-groups. The 
relations between each group's attitudes 
ard and contact with each out-group 
Were all negative and significant (range of 
:22 to —.36; M = —.28; all ps < .001). For 
the whites and Mexican Americans, the 
Correlations of in-group attitudes and con- 
tact were significantly smaller than the cor- 


Table 3 
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Table 2 
Interethnic Contact 
Pre-desegregation contact with Post-desegregation contact with 
Mexican Mexican 
Background Blacks Americans Whites Blacks Americans Whites 


White students 
Segregated 
M 174 18.3 254 16.7 18.1 25.2 
SD 3.7 41 24 3.0 39 2.5 
Integrated 
M 17.8 17.7 25.0 18.4 19.0 25.2 
SD 3.8 44 27 39 A1 24 
Black students 
Segregated 
M 25.6 14.8 18.0 24.0 16.0 17.8 
SD 24 4.6 4.0 46 44 25 
Integrated 
M 24.5 16.8 18.7 ` 254 17.4 18.7 
Mexican American students 
Segregated 
M 14.9 25.7 17.1 154 259 17.4 
SD 4.6 26 44 23 21 a8 
Integrated 
M 243 19.2 17.1 24,2 


relation between the attitudes toward a 
particular out-group and contact with 
members of that out-group (all ps < .01, 
using Fisher's z test). A similar trend was 
observed for the blacks, but the differences 
were not significant (ts < 1.61, ns). These 
results suggest that there is a stronger rela- 
tionship between attitudes and behavior 
toward out-group members than toward in- 


group mem 


Self-esteem nega and signifi- 
cantly correlated with atti toward the 
in-group for blacks (r = —.19, p <.001) and 


for whites (r = —.18, p < 001), The corre- 
lation for Mexican Americans was also neg- 
ative, but it was only marginally significant 


Mean Self-Esteem 
tion 


Integrated 23.3 22.4 23.6 246 
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(r = —07, p < .10). The correlations be- 
tween self-esteem and out-group attitudes 
were negative and significant for the whites 
and Mexican Americans (range of —.11 to 
—.19; all ps < .01). The correlation between 
blacks’ self-esteem and their attitudes 
toward whites was only marginally signifi- 
cant (r = —.19, p <.10), and the correlation 
of black self-esteem and attitudes toward 
Mexican Americans was nonsignificant (r — 
.01, ns). With the exception of the Mexican 
Americans' contact with both out-groups (r 
= .15, p <.01, for whites; and r = .18 p <.01, 
for blacks), none of the correlations between 
self-esteem and contact were significant. 


Discussion 


The results for the attitude measure in- 

dicate that each group was highly ethno- 
centric. The data from the contact measure 
also support this conclusion, since they show 
that each ethnic group had much more con- 
tact with members of their own group than 
with out-group members. These results are 
not surprising, in that ethnocentrism ap- 
pears to be a nearly universal characteristic 
of all cultural and subcultural groups 
(LeVine & Campbell, 1972; Sumner, 1906). 
But, this finding is important because of the 
widespread belief that minority group 
members hold their group in low esteem 
(Brand, Ruiz, & Padilla, 1974). Clearly, the 
minority students in this study have more 
positive attitudes toward in-group than 
out-group members. 

The attitudes of blacks and whites from 
segregated backgrounds toward both in- 
group and out-group members were more 
negative after desegregation than before. It 
is important to note that the negative 
changes in the attitudes of blacks and whites 
toward members of the out-group typically 
would have been interpreted as an increase 
in prejudice due to desegregation. By in- 
cluding in-group evaluations as a control in 
the present study, it is possible to show that 
these negative changes are part of a more 
general pattern to evaluate all groups more 
negatively after desegregation. The fact 
that the attitudes of blacks and whites from 
integrated backgrounds did not differ from 
before to after desegregation suggests that 
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desegregation caused the negative changes 
in attitudes found for the blacks and whites 
from segregated backgrounds. It is likely 
that school desegregation was more stressful 
for the students from segregated back- 
grounds than for those from integrated 
backgrounds. Unlike students from inte- 
grated backgrounds, the students from se- 
gregated backgrounds had no previous op- 
portunities to learn the norms that govern 
interaction with out-group members in the 
School context. The effects of the stress 
created by having to cope with these prob- 
lems apparently resulted in a generalized 
negative evaluation of all three groups. 

The interethnic attitudes of the Mexican 
American students.from segregated back- 
grounds did not change after desegregation. 
This result is probably due to the limited 
desegregation that Mexican Americans ex- 
perienced in this school system. The de- 
segregation plan that was implemented re- 
sulted in one sixth-grade center containing 
89% Mexican Americans, 8% blacks, and 3% 
whites. Virtually all of the Mexican Amer- 
icans from segregated backgrounds in the} 
present sample attended this school afte 
desegregation. For the Mexican Americans 
from segregated backgrounds, the desegre-] 
gated schools were only slightly less Segre-| 
gated than the schools they had previously | 
attended, which probably reduced the stress 
they experienced. 

The results for self-esteem revealed only 
one significant effect involving desegrega- 
tion. Blacks from integrated backgrounds 
increased in self-esteem while those from 
segregated backgrounds decreased. . These 
results are parallel to those obtained by St. 
John (1971) and can be interpreted in terms 
of social comparison theory (Festinger, 1954: 
Pettigrew, 1967). According to social com- 
parison theory, people tend to evaluate their 
abilities, traits, and emotions by comparing 
themselves to similar others. As the atti- 
tude data suggest, blacks from segregate 
backgrounds probably experienced consid- 
erable difficulty adjusting to the desegre 
gated schools. In contrast, it is likely m 
blacks from integrated backgrounds, W0 
had extensive prior experience in multiet 
nic settings, adjusted relatively well. es 
in this respect, blacks from integrate 


ckgrounds could make a favorable com- 
ison to blacks from segregated back- 
ounds. These favorable comparisons may 
en have led to increases in self-esteem for 
e blacks from integrated backgrounds but 
decreases in self-esteem for those from 
gregated backgrounds. : 
Overall, blacks had the highest self-esteem 
ind Mexican Americans had the lowest. 

his finding supports a trend that has ap- 
eared in a number of recent studies (Ed- 
ards, 1974; Hodgkins & Stakenas, 1969; 

aba & Grant, 1970; McDonald & Gynther, 
1965; Powell & Fuller, 1970). These studies 
suggest that if it were ever true that blacks 
lad negative self-images, an assumption 
uestioned by Shuey (1966) and Banks 
1976), it is probably not currently the case. 
The finding for Mexican Americans is con- 
istent with the findings of other studies (see 
the review by Zirkel, 1971). 

The correlations among the attitude 
measures reveal the operation of tendencies 
toward consistency. The finding that atti- 
tudes toward out-groups are strongly corre- 
lated supports the contention of authori- 
pn personality theorists that prejudiced 
individuals tend to have generalized preju- 
lices that are directed toward all out-groups 
(Adorno, Frenkel-Brunswick, Levinson, & 
Sanford, 1950). The fact that the correla- 
tions between attitudes toward the out- 
groups were higher than the correlations 
between in-group and out-group attitudes 
suggests that the correlations for out-groups 
Were not simply a product of a general re- 
sponse bias to evaluate all others negatively 
or positively. The results for interethnic 
tontact showed that interaction with the two 
but-groups tended to be similar. The find- 
Ing that the correlations for out-group con- 
lact were higher than the correlations be- 
Iveen in-group and out-group contact indi- 
late that out-group correlations were not a 
lesult of a general tendency to be more or 
less sociable in relating to others. : 
„Previous research leads to an expectation 
lhat interethnic attitudes and interethnic 
‘ontact should be correlated (Allport, 1954; 

lir, 1969; Campbell, 1956; Webster, 1961). 

is expectation was confirmed in the 
Dresent study. The correlational nature of 
these data make it impossible to determine 
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whether contact causes reductions in prej- 
udice or reductions in prejudice cause in- 
creases in contact. However, cross-lag cor- 
relational analyses of data from a longitu- 
dinal follow-up study of a sample of white 
students suggest that it is increased contact 
that reduces prejudice and not vice versa 
(Stephan & Rosenfield, 1978). The signifi- 
cant differences between the correlations of 
attitudes toward in-group and out-group 
members and attitudes toward the two out- 
groups that were paralleled by the correla- 
tions for contact and for the relationship 
between contact and attitudes, all support 
the conclusion that there is a clear distinc- 
tion in the orientation that the students 
adopted toward in-group and out-group 
members. 

The correlations of self-esteem with the 
other variables were generally low. There 
were small but generally significant corre- 
lations between self-evaluation and evalua- 
tions of in-group and out-group members. 
These results provide weak support for 
Fromm’s (1939) contention that self-hatred 
and hatred of others are inseparable. 


Conclusions 


The results from the present study indi- 
cate that desegregation had little effect on 
self-esteem and interethnic contact, but it 
did cause blacks and whites who had previ- 
ously attended segregated schools to evalu- 
ate both in-group and out-group members 
more negatively. ‘These results are consis- 
tent with the previous research on desegre- 
gation, which shows that in the majority of 
cases, desegregation did not have positive 
effects on self-esteem, interethnic contact or 
prejudice. A common view of desegregation 
is that it should be regarded as a failure if the 
positive effects predicted by the social sci- 
entists testifying in the Brown V. Board of 
Education! trial are not found (for a review 
of these predictions, see Stephan, 1978). 
From this perspective, the fact that deseg- 
regation has been found to have no effects or 
negative effects in so many studies is taken 


rd of Education of Topeka, 98F. 


1 . Boal 
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as evidence that desegregation does not 
work. In the introduction, it was argued 
that the studies of desegregation are simply 
too weak methodologically to constitute a 
reasonable basis for coming to such a con- 
clusion. One reason why the present study 
and many other studies have found negative 
or no effects due to desegregation is that 
most of these studies investigated the effects 
of newly instituted desegregation plans. It 
could be argued that the first years of de- 
segregation constitute a period during which 
positive effects are least likely to occur. 
Desegregation plans often include changes 
in curricula and the desegregation of the 
teaching staffs as well as the desegregation 
of the students. Combined with the nega- 
tive effects of community opposition to de- 
segregation, the anxieties of the teachers and 
students, and confusion over school sched- 
uling, it is not surprising that desegregation 
rarely has initial positive effects on race 
relations or self-esteem. If desegregation 
has positive effects, it is probable that it 


takes more than a year or two for them to 
evolve, 
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Children’s Metacognitive Knowledge About Reading 


Meyer Myers II and Scott G. Paris 
Purdue University 


i i iti i i ding was as- 
Children’s metacognitive awareness of variables that influence rea 
sessed in an interview study. Eight- and 12-year-old children answered ques- 


tions about the effects of personal 


abilities, 


task parameters, and cognitive 


ies involved in reading. Although young children were aware of the in- 
rd some reading pegas such as interest, familiarity, and length, 
they were less sensitive to the semantic Structure of paragraphs, goals of read- 
ing, and strategies for resolving comprehension failures than sixth-grade chil- 
dren. Age-related differences in metacognitive knowledge may be correlated 
with the acquisition of efficient memory, problem-solving, and reading skills. 


Solving problems, remembering a series 
of words or pictures, and comprehending 
prose are often deliberate actions that re- 
quire self-invoked plans and cognitive skills. 
In order to accomplish these goals, a learner 
must coordinate a variety of information 
regarding the task and his available strate- 
gies and apply it appropriately to the prob- 
lem at hand. The general knowledge that 
guides effective selection and implementa- 
tion of task-relevant skills has been referred 
to as metacognition (Brown, in press; Fla- 
vell, 1977). Itis regarded as a “higher” level 
of thinking than task-specific strategies be- 
cause metacognitive knowledge constitutes 
transituational information about the pa- 
rameters of learning and performance. 
Metacognitive knowledge serves an execu- 
tive function of coordinating and directing 
the learner’s thinking and behavior. 
Flavell and Wellman (1977) identified 
person, task, and strategy variables as three 
important categories of metacognitive 
knowledge that might help children to Te- 
member effectively, First, children need to 
know about their own enduring character- 
istics and transient conditions that influence 
performance, Learners or memorizers need 
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to appraise realistically their potential in 
order to engage in skills commensurate with 
their ability. Second, children need to know 
about the purposes, scope, and requirements 
of the task before the problem can be effi- 
ciently attacked. Third, one needs to be 
aware of the existence of relevant strategies 
and to recognize the need to apply them. 
Further, one must form plans, generate hy- 
potheses, check one's progress, evaluate re- 
sults, and generalize behavior. In some 
senses, these are ideal characterizations of 
the knowledge required to solve problems or 
remember. Yet, an extensive literature, 
particularly on memory development, has 
Shown that while adults and older children 
„are often sensitive to metacognitive vari- 
ables, children younger than 8 years of age 
are less sensitive (Brown, in press; Flavell, 
Note 1). Since the development of chil- 
dren’s metacognitive knowledge is mae 
With efficient learning, remembering, nul 
communicating, it may provide a critical lin 
in explaining the transition from a novice to 
à sophisticated problem solver. : 
Reading is a complex behavior that in- 
volves interactions among perceptual pro- 
cesses, cognitive skills, and mee 
knowledge. For example, Stauffer (196: i 
cited a 1936 definition by Gray that ao 
that effective reading “assumes that ber 
reader not only recognizes the essential ne 
or ideas presented, but also reflects on px 
significance, evaluates them critically, 5 
covers relationships between them, a 
clarifies his understanding of the ideas ap 
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KNOWLEDGE ABOUT READING 


prehended” (pp. 8-9; emphasis added). 
Awareness and deliberate use of such com- 
prehension monitoring strategies is critical 
for proficient reading. The value of thinking 
about one’s thinking, or awareness of meta- 
cognitive knowledge, and its relationship to 
good reading skills has been stressed re- 
peatedly since Dewey’s (1910) emphasis on 
reflective thinking. A recent quotation from 
Gibson is particularly illuminating with re- 
spect to the role of metacognition in read- 
ing: 


One [trend in cognitive development] that seems to me 
especially important is the increasing ability to be aware 
of one's own cognitive processes, from the segmentation 
of the phonetic stream all the way up to the under- 
standing of the strategies of learning and problem- 
solving. ‘There seems to be a consciousness-raising that 
- goes along with many aspects of cognitive development, 
and it turns out, I think, to be associated with attaining 
mature reading skills. (Gibson, 1974, p. 25) 


Despite the importance attributed to re- 
flective thinking and the role of metacogni- 
tive knowledge, little research has been 
conducted to assess children’s knowledge 
about the parameters of reading. If the 
foregoing claims about the importance of 
metacognition are true, then children’s un- 
derstanding of skills, purposes, and dimen- 
sions of reading should influence how they 
learn to read. What kinds of metacognitive 
Understanding do young children have about 
reading? Unfortunately, the answer, much 
like the data on metamemory (see Flavell & 
Wellman, 1977), seems to be “not much.” 
Reid (1966) conducted a series of interviews 
With 5-year-old beginning readers to find out 
What concepts they had about the activity of 
Teading. She observed that children ap- 
proached reading as “a mysterious activity, 
to which they come with only the vaguest 
expectancies,” and “were not even clear 
Whether one read the pictures or the other 
Marks on the paper” (Reid, 1966, pp. 60-61). 
Although most 4- and 5-year-olds can dif- 
ferentiate writing from other characters and 
drawings (Lavine, 1977), beginning readers 
do not seem to understand the goals or 
Meaning of reading. Clay (1973) found that 

of 5-year-old school entrants in New 
Zealand did not know that print rather than 
Pictures told the story. After 6 months of 
" School, nearly 90% of the children knew this 
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metacognitive information about the task. 
Yet, some children still confused the pur- 
poses and nature of reading after a year of 
schooling. 

In addition to conceptualizing the pur- 
poses and scope of reading tasks, children 
must learn to employ strategies such as 
predicting, planning, checking, and gener- 
alizing. The propensity to engage in or un- 
derstand the need for comprehension mon- 
itoring increases with age. Self-correction 
rate in oral reading, for example, is a spon- 
taneous and overt form of monitoring one’s 
reading. In Clay’s (1973) research, the top 
50% of young readers corrected sponta- 
neously 1 of 3 errors, while poor readers only 
corrected 1 of 20 errors. In fact, the rate of 
self-correction was more closely related to 
progress in the first three years of instruc- 
tion, when the emphasis is on oral reading, 
than either intelligence or reading readiness 
scores (Clay, 1973). Clay’s results indicated 
that comprehension checking is a useful 
strategy and develops with skill efficiency in 
reading. Beginning readers and poor read- 
ers are less likely or less able to monitor their 
own understanding. For example, when 
Clay (1973) asked large groups of 7- and 8- 
year-olds, “What do you do when you come 
to a word you don't know?" nearly 50% of the 
7-year-olds responded “Don’t know,” “I'd 
skip it,” or reported other kinds of defaults. 
Only 4% of the 8-year olds reponded with 
these kinds of shrug-the-shoulders answers. 
Usually, they answered that they would an- 
alyze the word parts, use the sentence con- 
text, or solicit help. 1 

In general, beginning readers, like young 
children in other cognitive tasks, have an 
extremely limited understanding of the task 
dimensions and the need to apply strategies 
for reading. The purpose of the present 
investigation was to extend the analysis of 
children's metacognitive knowledge about 
reading in order to provide a broader de- 
scription of their conceptualizations. 
Standardized questions were given to chil- 
dren in order to assess their understanding 
of person, task, and strategy variables in- 
volved in reading. This was a preliminary 
study modeled after Kreutzer, Leonard, and 
Flavell's (1975) investigation of children's 
metamemorial knowledge. Our procedures 
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and questions were guided by thier work and 
our own intuitions about important param- 
eters of reading. In order to extend the re- 
search on beginning readers (Clay, 1973; 
Reid, 1966) and to coincide with the age 
ranges usually studied in metamemory tasks, 
two groups of children who could already 
read but who were of widely different ages 
and abilities were tested. 


Method 


Subjects 


The subjects were 20 second graders (mean age = 7 
years 9 months; range = 7 years 2 months to 8 years 9 
months) and 20 sixth graders (mean age = 11 years 9 
months; range = 11 years 2 months to 12 years 2 
months), balanced for sex and selected without regard 
for reading ability. 


Materials 


Eighteen interview items, each consisting of varying 


` mode, length of story, speed, preference, goals, structure 
of paragraphs, and familiarity. 
strategy variables 


monitoring as reading skills, Due to the exploratory 
nature of the study, several unproductive items were 
included on the interview script. Some were Purpose- 
fully incorporated in order to maintain children’s in- 
; conversational nature 
were ambiguous or re- 


à selective report of consistent data that bear on de- 
velopmental aspects of reading.! 


Procedure 


individually in a quiet 
room at school, The child and the experimenter were 
with a microphone and 
The children were in- 


or “Wrong” answers to the questions and that we “i 
want to know what you think." ions Oats 


The questions were 
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read from the script in a conversational manner in th 
same order for all children. If a subject was unable ig, 
answer or clearly misunderstood the question, it was) 
repeated. If the repetition failed to elicit a response, | 
the question was rephrased until an answer was pros 
duced. The entire session was tape recorded. Ingen: 
eral, the sessions were relaxed and informal and laste 
about 25 minutes, 


Scoring 


Each child’s responses were transcribed into a written 
account from the tape-recorded interview. Two judg 
checked the transcriptions and recoded lengthy re 
sponses into one- or two-word summaries that we 
semantically equivalent to the original reports. "Them 
were fewer than 296 disagreements between judges ip 
this phase of data reduction, and they were resolved 
through mutual agreement. After preliminary exam 
ination of the data, several categories of responses we 
established for each interview item. Only the first rë 
sponses given by each child were analyzed in ordert 


oversight, some subjects were not administered all of 
interview items. Omissions were few and consid 

to be random; consequently, sample size varied slight ly 
for different questions. 


Results 


In order to facilitate presentation of thé 
data, each of the 18 interview questions Is 
given verbatim and grouped according t0 


! Copies of the complete interview are available from 
the authors. 


—— 
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Table 1 ] 
Frequency of Subjects Reporting Various Characteristics of Good Readers 
Special skills Generalskills — Motivation Don't. 
; General Schooling Likes Tries — kn 
Grade Practice Vocabulary Pronunciation knowledge? ^ andage reading harder a 
Second 6 1 0 5 2 0 1 5 
Sixth 14 0 1 1 0 2 2 0 


^ Responses included “knows more,” “better learner,” and “more experience." 


knowledge about individual abilities. The 
first question follows: 

1. “What makes someone a really good 

reader?” Although responses were varied, 
they were classified into the four categories 
shown in Table 1: general knowledge, spe- 
cial skill, motivation, and don’t know. 
Seventy percent of the sixth graders but only 
40% of the second graders reported that 
practice and special skills were necessary 
components of good reading. On the other 
hand, 25% of the younger children were un- 
able to report any qualifications of good 
readers, while none of the older children was 
unable to speculate about the characteristics 
of good readers. When knowledge of special 
skills was compared to the less sophisticated 
responses of the combined categories of 
general knowledge, motivation, and don’t 
know, the grades differed significantly, x? (1) 
= 4.94, p < .05.? 
_ A second question provided additional 
information about the specific skills pos- 
sessed by good readers and was designed to 
determine whether children perceive reading 
ability as a general manifestation of school 
achievement or as a specialized skill. Chil- 
dren were presented with the following sit- 
uation: 

.2. “The other day I talked to a boy [or 
girl] who was really good at arithmetic. 
Then I asked him [or her] if he [or she] was 
à good reader. What do you think he [or 
she] said?" (Questions were phrased in 
terms of the same sex of subjects.) Twelve 
of 20 second graders reported that good 
mathematics skills are associated with £ 
teading skills, while 14 of 20 sixth graders 
realized that the two skills were not neces- 
sarily dependent. One child in each grade 
tesponded that reading depends on the in- 


dividual child. The age difference was sig- 
nificant, x2(1) = 3.83, p < .05. Responses to 
Questions 1 and 2 indicated that older chil- 
dren reported that proficient reading in- 
volves specialized skills, while younger 
children did not. 

Motivation and limitations. Another 
metacognitive bit of information relevant to 
a person’s concept of his abilities is the 
awareness of limiting conditions and how 
one might overcome them. We were curious 
to see if children are sensitive to reading 
development as a function of opportunity 
and motivation and gave them the following 
hypothetical situation: 

3. “Suppose there were two boys named 
John and Alan who came from different 
homes. John’s parents were wealthy, and 
John had lots of toys and books. Alan's 
parent, though, were poor and didn’t have 
many books at home. Do you think one of 
these boys was a better reader at school? 
Which one? Why?” Nearly all of the 
younger children (90%) reported that the 
rich boy with more environmental opportu- 
nities could read better. On the other hand, 
65% of the sixth graders reported that the 
poor boy would read as well if not better than 
the rich boy (35% reported that the poor boy 
would be better). They explained that the 
poor boy’s limitations might be qualified by 
motivation and other factors. A typical 
justification reported by sixth graders was 
the following: “He (the poor boy) would 
spend more time reading, and the rich boy 
would play around.” Significantly more 
sixth graders than second graders reported 


mn 
2 All chi-square anal: 
contingency tables were 


yses derived from four-fold 
Yates corrected. 
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that the poor boy would be equal to or a 
better reader than the rich boy, x?(1) = 8.9, 
p <.005. 


Task Variables 


Materials. Three examples of task vari- 
ables that can influence reading are the 
length of the passage, the familiarity of the 
story content, and the reader’s interest in the 
story. The following questions were pre- 
sented to children to assess their awareness 
of these parameters: 

4. “One day I asked Jim to read a story 
that was five pages long while Tom read a 
story that was two pages long. Which boy 
took the longest to read the story? Whodo 
you think remembered the most?” 

5. “The whole class was going to read a 
story about New York City. Ann was in 
New York last summer for her vacation. Do 
you think that the story might be easier or 
harder for Ann to understand than Jane who 
had never been to New York?" 

6. "What's your favorite kind of story? 
(Child's Response is designated X). Say 
your teacher wanted you to read something, 
something you really didn't like as much as 
X. Which do you think you would read 
faster, X or the teacher's story? Which one 
would be easier to remember?" 

The majority of children from both age 

groups reported that these variables affect 
reading. All sixth graders and all second 
graders reported that long passages require 
more reading time than short Passages, and 
all sixth graders and 75% of the second 
graders reported that familiarity with the 
story content facilitates story comprehen- 
sion. Additionally, 6596 of the sixth graders 
and 7596 of the second graders reported that 
preferred stories can be read faster than 
stories that are disliked, and 90% of the sixth 
graders and 85% of the second graders re- 
ported that preferred stories would be easier 
toremember. Thus, no significant age dif- 
ferences were found for the length, famil- 
iarity, and interest variables, 

Reading mode. A task parameter rele- 
vant to reading is the mode in which material 
is read. To assess this variable, children 
were asked the following question: 

7. “Which is quicker, reading out loud or 


MEYER MYERS II and SCOTT G. PARIS 


reading to yourself?" Eighty-nine percen 
of the sixth graders and only 5096 of the set. 
ond graders indicated that reading silent] 
is faster than reading aloud, while 45% of th 
second graders and only 11% of the sixt 
graders replied that reading aloud and si 
ently resulted in the same reading speed. 
comparison of “aloud” and “silent” re 
sponses across grades yielded significan 
grade differences, x?(1) = 4.60, p < .05. 

Structural cues. The reader's knowledge 
of structural features of prose might serve as 
a guide to comprehension. Several ques: 
tions were constructed in order to investigate | 
children’s awareness of paragraph struct e 
Children were asked the following ques 
tion: 

8. "Is there anything special about th 
way sentences go into a paragraph or story? 
Seventy percent of the sixth graders and 47) 
of the second graders indicated that they 
were aware that sentences are organized 
within a paragraph, but this age difference 
was not significant. However, only 2 of the 
9 second graders who reported awareness of 
structural features of paragraphs gave j 
tifications that specified the sequential na- 
ture or common topics of sentences in para | 
graphs, while 9 


9. “What does the first sentence usually 
do for a paragraph or story?” on 
10. "What does the last sentence do? 
As shown in Table 2, 8096 of the sixth 
graders reported that the leading sentence 
is a semantic introduction to the paragraph: 
while only 20% of the second graders re- 
ported that sentence attribute. The ma 
jority of the second graders did not know the 
function of the first sentence or reported he 
it began the paragraph or started wit 
capital letter. Young children were i 
aware of the semantic characteristics of t 
first sentence, x2(1) = 9.24, p < .005. the 
In response to Question 10, 50% of - 
sixth graders and only 1 second grader re 
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ported semantic summary properties for the 
last sentence of a paragraph. The remaining 
subjects either reported temporal or punct- 
uation properties of the sentence or said, “I 
don't know." Comparing “summary” re- 
sponses with all other responses yielded a 
significant age difference, x?(1) = 7.07, p < 
01. While many children from both grade 
levels reported that sentences are organized 
to form paragraphs, significantly more sixth 
graders were aware of the semantic proper- 
ties of the first and last sentences of a para- 
graph. 


Goals 


In order to solve problems, one must un- 
derstand the task, form.a conception of the 
goal, and select and implement appropriate 
means to attain that goal. The way in which 
young children perceive the goal of a given 
lask may differ from older children. To test 
this possibility for reading, children were 
asked the following question: 

11. “Do you ever tell the story that you 
read to someone else? What do you try to 
tell them, all the words or just the ending or 
what?” Ninety-five percent of the sixth 
graders indicated that they would attempt 
to reproduce the story meaning during recall, 
while 4596 of the second graders responded 
that they would attempt to reproduce the 
story verbatim. Comparing “verbatim” and 
“meaning” responses across grades yielded 
Asignificant grade difference, x*(1) = 7.68, 
p<.01. Almost all the sixth graders per- 
ceived the goal of a story recall task as 
meaning construction, but the goal for nearly 
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half of the second graders was exact repro- 
duction. 

If young and old children perceive task 
goals differently, it might be expected that 
they also differ in their awareness of the 
appropriate selection of a means to attain 
their selected goal. ‘The following question 
was asked to reveal information relevant to 
this hypothesis: 

12. “The other day I asked Bill to read 
a story and then to tell me what he read, 
Before he started reading, though, he asked 
me if I wanted him to remember the story 
word for word or for the general meaning. 
Why do you think he asked me that?” The 
wide variety of responses to this question 
were grouped into the following three con- 
ceptual categories: specific strategy, general 
aid, and other responses. Children’s re- 
sponses were included in the first category 
if they indicated that knowledge of task goals 
could elicit specific study strategies. As an 
example, one sixth grader reported that “If 
you wanted him to remember words, he 
would take a lot longer ‘cause he would 
memorize; if meaning [he NUM -— 
parts of the paragrap portant A 
If children reported that knowing the goal of 
the task would help them respond correctly 
but did not specify how it would help, their 
response was scored as “general aid.” This 
category included responses indicating that 
knowing the goal would 
and that they would know what information 
was required for recall. The “other re- 
sponse” category included bizarre justifica- 
tions and "I don't know responses, As 
shown in Table 3, 60% of the sixth graders 


Table 2 1 
Characteristics of First and Last Sentences in Paragra, 
ios 


Semantic characteristics 


.—— Nonsemantic characteristics — — 
Don't know 


Introduce or Tunt Salinas 
Grade summarize topic n or ei 


Second 6 


First sentence 
2 


p 1 
Sixth 17 2 o 
9 


Last sentence 


Second 1 5 


3 
4 


Sixth 10 5 i 
"ends with a period." 


“Includes responses such as “starts with a capital letter" or 


| 
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Table 3 


Frequency of Subjects Reporting Why Someone Would Want 


to Know the Goal of a Reading Task 
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Specific General aid 
stra Know Other 
Study Help required Just to 
Grade differently remember answer know Irrelevant 
Second 1 2 2 3 2 
Sixth 12 1 3 0 2 


realized that knowing the goal of a reading 
task can lead to the employment of different 
strategies, while only 1 second grader indi- 
cated any awareness of differential strategy 
use applied to the presented situation. Fifty 
percent of the second graders had no idea 
why someone would want to know the goal 
of the reading task. The trend for more 
sixth graders to indicate specific strategies 
and for more second graders to indicate 
"other responses" was significant, x2(1) = 
12.96, p < .001. 

A potential reason that second graders do 
not differentiate exact reproduction from 
meaning reconstruction is that they perceive 
exact word recall as equivalent to, or at least 
as easy as, recall of the story meaning. To 
investigate that. possibility, children were 
asked the following: 

13. "Which would be easier to do, read 

word for word or for the general meaning?" 
Sixty-five percent of the second graders and 
9096 of the sixth graders reported that 
meaning recall is easier than exact repro- 
duction. The grade differences were not 
significant, indicating that most of the sec- 
ond graders realized that meaning recall was 
easier than exact reproduction, 

Another reason for differential strategy 
use by grades may be that second graders are 
not aware of the importance of selecting 
specific strategies for particular goals. We 
asked the following: 

14. “Would you do anything differently 
if you had to 
Only 33% of the second graders, as opposed 


different strategies for the different task 
x2(1) = 6.46, p « .025. 

Tn general, second graders reported 
exact recall is more difficult than mea 
recall but were not able to report diff 
strategies, indicating an awareness of. 
ferential task difficulty but illumim 
their obliviousness of matching meant 
goals. Older children seemed to be 
aware of subordinating appropriate mi 
to specific goals and better able to discr 
nate the varying difficulty of tasks thal 
younger children. 


Strategy Variables 


Skimming. The previously discussed 
differences of second graders reiteratin 
words and sixth graders reconstructin 
meaning may indicate that younger childre 
perceive the purpose of reading as dec ; 
while older children perceive a goal 0 
meaning extraction or construction. 
difference should be reflected in perca 
strategies for skimming. If the reader's 
is to decode written material, then he or shi 
may be expected to attend to easily 
nounced and familiar words while skimm 
Readers concerned with meaning extractio 
would attend to those words and phr 
that convey the most information. To 
amine this hypothesis, children were a8 
Question 15: 

15. “If you had to read a story 
quickly and could only read some 0 
words, which ones would you try to re xdi 
As indicated in Table 4, 7096 of the e a 
graders and only 30% of the sixth gradem 
reported that while skimming, they WO! 
attend to words that would be easy for 
toread. Sixty percent of the sixth grace" 
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Table 4 
Frequency of Subjects Attending to Different 
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Aspects of a Paragraph While Skimmin, 


x Eas; Informative Other 
Familiar Easy Informative Difficult First Any Don’t 
Grade words words words words sentence half know 
Second 5 10 0 0 0 3 2 
Sixth 1 5 6 6 2 0 0 


and none of the second graders indicated 
that they would skim for words that yielded 
the most information. The other 10% of the 
sixth graders indicated a skimming strategy 
of reading only the first portion of each 
paragraph. An analysis of the categories 
“easy words” versus “information” yielded 
asignificant difference between grades, x2(1) 
= 16.08, p < .001. 

Resolving comprehension failures. De- 
termining the meaning of unknown words 
and sentences is a crucial aspect of reading. 
Even sophisticated readers encounter in- 
comprehensible material and need to draw 
upon strategies to resolve those compre- 
hension failures. Several questions were 
constructed to investigate children’s 
awareness of their own methods for deter- 
mining unknown information. One ques- 
tion we asked was the following: 

16. “What do you do if you don't un- 
derstand a word that you read?” As illus- 
trated in Table 5, all children indicated a 
strategy for determining an unknown word. 
Both groups said they would ask other peo- 
ple for help in learning new words (40% of 
the second graders and 35% of the sixth 
graders). Thirty-five percent of the sixth 
graders reported that they would seek help 
from a dictionary, and 40% of the second 


graders would try to sound out the words. 
The trend for second graders to respond 
“sound out” and sixth graders to respond 
“dictionary” was significant (Fisher's exact 
p <.005) and offers further support for the 
hypothesized decoding goal of young chil- 
dren. 

In order to investigate children’s aware- 
ness of strategies to resolve sentence com- 
prehension failures, they were asked the 
following question: 

17. “What do you do if you don’t un- 
derstand a whole sentence?” Examination 
of Table 5 indicates that the most frequent 
answer was again to seek help from other 
people (40% of the second graders and 55% 
of the sixth graders), but unlike responses for 
determining words, 30% of the second grad- 
ers could not report how they would resolve 
this comprehension failure. Comparing 
“don’t know” responses with “ask another 
person” responses yielded a significant grade 
difference (Fisher’s exact p < .005). 

In order to determine if children would 
reread a passage to comprehend asentence, 
the children were asked Question 18: 

18. “Do you ever have to go back to the 
beginning of a paragraph or story to figure 
out what a sentence means: Why? 
Fifty-five percent of the second graders and 


Table 5 
Preque; uency of Subjects Reporting Strategies for Resolving Comprehension Failures 
j i i Nonstrategic effort or default 
Grade  Askeomeone Dictionary Sound out Context Reread Think Try Skip Don't know 
Unknown os : ; E à 


Second 8 0 8 


0 0 2 0 


0 
Sixth 1 1 2 1 1 
6 


Second 


Unknown sentence 
0 1 


i-i o3 
AW ERES bibas. 


: 8 0 
Sixth il 0 0 2 » 
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80% of the sixth graders reported that they 
would reread the paragraph. The difference 
between grades was not significant. How- 
ever, 75% of the sixth graders who responded 
that they would reread reported the justifi- 
cation that rereading the paragraph would 
provide information and contextual cues 
useful for determining the sentence meaning. 
Eighty-eight percent of the second graders 
who responded “reread” reported non- 
strategic justifications or could not justify 
their response. Significantly more sixth 
graders reported that they would reread in 
order to utilize contextual cues to resolve 
sentence comprehension failures, x2(1) = 
6.31, p < .025. In general, young children 
had few resources available for deciphering 
the meaning of unknown words or sentences 
and seemed insensitive to the need for re- 
solving comprehension failures. 


Discussion 


Young children in this study were unaware 

of many important parameters of reading. 
They were not sensitive to task dimensions 
or the need to invoke special strategies for 
different materials and goals, They re- 
ported few strategies or reasons for checking 
their own understanding or Progress and 
were not aware of specific characteristics of 
proficient readers. Although young children 
were aware of limitations such as opportu- 
nity to read, they did not report neutralizing 
factors such as motivation for overcoming 
those obstacles. While young children were 
aware of some task variables (e.g., interest, 
familiarity, and story length) and indicated 
that sentences are organized within para- 
graphs, they were insensitive to specific se- 
mantic features such as sequencing or com- 
mon topics. Also, they were unaware of the 
introductory and summary qualities of first 
and last sentences in paragraphs. Second 
graders were less sensitive to the strategies 
required by different reading and memory 
goals. They reported fewer strategies than 
older children and were not as accurate in 
coordinating particular strategies with spe- 
cific task goals. On the other hand, older 
children were aware of the existence of var- 
lous reading strategies and were sensitive to 
when and how to use them. 
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Children’s reported awareness of meta. 
cognitive knowledge about reading is con.* 
sistent with data on children’s reports about 
metamemory (Kreutzer, Leonard, & Flavell, 
1975). Young children from both studie 
were aware of the facilitative effects of fa- 
miliarity with task materials, recursive op. 
erations (rereading and more study time), 
and paraphrasing. Consistent age differ- 
ences were also found. Young children tend 
to refer to external sources such as other 
people to resolve unknown information, 
while older children generate more internally 
oriented strategies. Older children in both 
studies distinguished between tasks where 
the perceived level of task difficulty has 
implications for amount and kind of prepa- | 
ratory action and were generally more sen- 
sitive to subordinating the appropriate 
means to the service of remembering ot 
reading. 

A general implication of children's re 
sponses in the present study is that second 
graders perceive reading as an ortho 
graphic-verbal translation problem rather 
than as a meaning construction and com- 
prehension task. Young children were rel- 
atively insensitive to semantic dimensions 
of paragraphs or to goals and methods of 
meaning apprehension. They focused on 
exact story reproduction rather than recall 
of a story’s general meaning and thought 
reading aloud was quicker than silent read- 
ing. Also, they seemed to be unaware of the 
special characteristics of good readers aD 
the special strategies required for monitoring | 
understanding. In general, second graders 
focused on decoding goals rather than mn 
mantically related goals for reading and i 
dicated few strategies appropriate for x 
formation extraction or construction. Six A 
graders were more aware of meaning A | 
mensions of paragraphs and of the skills" 
quired to achieve understanding. de 

The present data are restricted. to " 
scriptions of age-related changes in C uH 
dren's reports, but there are several sP s: e | 
lations that could be offered to exped m 
development of metacognitive know’ m | 
about reading. A likely explanation 3 ate- 
educational materials and teachers § a 
gies are oriented toward decoding £08 5^. 
translation skills in beginning re 


| 


Young children’s metacognitive knowledge 
would be entirely consistent with explicit 
information provided by teachers if this were 
fre. An alternative speculation is that 
children induce and abstract metacognitive 
knowledge from many settings and prob- 
lem-solving situations and that greater 
awareness of means, goals, and task param- 
éters about reading reflects a general devel- 
opmental accomplishment (Paris, Note 2). 
Insupport of this view, children seem to ac- 
quire an explicit awareness of mnemonic 
skills and goals between the ages of 6 and 12 
years (Flavell & Wellman, 1977). Indeed, 
young children’s difficulties with deliberate 
problem-solving situations such as memory, 
reading, and referential communication may 
le a manifestation of their incomplete me- 
lacognitive awareness of person, task, and 
strategy variables that influence perfor- 
mance. 

Research on the relationships between 
teachers’ behavior and students’ metacog- 
nitive knowledge, reading knowledge and 
actual reading performance, and under- 
‘standing of reading vis-à-vis other cognitive 
ES is needed to elucidate the origins of 
thildren's metacognitions. Combining be- 
havioral research with interview studies such 
as the present investigation would help to 
disentangle the confounds between chil- 
dren’s verbal skills and reported knowledge 
and help to isolate the functional aspects of 
metacognitive knowledge that guide chil- 
dren’s performance. Investigations that 
employ such converging operations may 
yield information regarding the cognitive 
processes and knowledge that underlie effi- 
“lent reading. 

Although the pragmatic implications are 
numerous, we think the present research 
Suggests several fundamental relations 
among instruction, metacognitive develop- 
ment, and reading proficiency. First, in- 
Structional activities may influence readers’ 


Dp A proficient reader has learned to 

l prine a purpose to a particular task and is 
exible so that different goals can be set 
"nder different conditions. For example, 
fachers may provide instruction to readers 
Sr regulating behavior according to passage 
ifficulty, story length, amount of memory 
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Manfulness and facilitate self-guided be-. 
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demands, and various amounts of study 
opportunities in order to maximize com- 
prehension and memory. If one is aware of 
explicit task goals and aware of how different 
task parameters affect those goals, one can 
more easily select strategies and execute 
processes to meet those goals. Deliberate 
and efficient goal setting may be sensitive to 
direct instruction and is one important re- 
lationship between metacognition and 
reading (Stauffer, 1969). 

A second potential advantage of explicit 
awareness of reading variables is that it 
permits one to deliberately ignore irrelevant 
information and attend to meaningful as- 
pects of the task. For example, proficient 
readers may learn to ignore pictures, type 
setting, and background features of the 
message when they are tangential to the goal 
of meaning extraction. Deliberate attention 
involves perceptual processes but also could 
involve the recruitment of special strategies 
for understanding. A proficient reader may 
utilize such strategies as underlining, note 
taking, or selective rereading. Incorporation 
of such skills into the reader’s knowledge 
base and awareness of the value of those 
skills must precede their deliberate em- 
ployment. Awareness of one’s potential 
abilities and the development of a repertoire 
of task-relevant information may be ac- 
quired through a combination of instruction 
and induction. The development of the 
reader’s repertoire of knowledge, that will be 
necessary for deliberate and subsequently 
automatic skills of decoding and compre- 
hending, may be facilitated by explicit in- 
structions and ample reading experience. 
These aspects of reading, planful goal sett- 
ing, selective attention, strategy recruitment, 
and a repertoire of information interact 
continuously during competent reading and 
may be amenable to training and remedia- 

ion. ; 
E p purpose of this study has been to il- 
lustrate how reading skills can be embedded 
in a cognitive framework and related to 
children's developing appreciation of a va- 
riety of metacognitive knowledge. If me- 
tacognitive knowledge about reading is 
shown to be critical for the acquisition of 
reading skills, then educators may want to 
incorporate specific programs for teaching 
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this information to children into reading 
curricula. The results of the present study 
demonstrate that beginning readers have a 
limited understanding of reading as a cog- 
nitive activity and certainly could profit 
from instruction regarding the means, goals, 
and parameters of proficient reading. 
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A Test of the Developmental Imagery Hypothesis in 
Children’s Associative Learning 


Joel R. Levin and Michael Pressley 
University of Wisconsin —Madison 


Separate samples of kindergarten children were administered a paired-asso- 
ciate learning task at the beginning and end of the school year, under either 
regular (control) or self-generated visual imagery instructions. Consistent 
with previous speculations about the relationship between maturation and the 
efficacy of imagery instructions on this task, age predicted paired-associate 
learning performance in the imagery condition even when general ability and 
amount of formal schooling were controlled. In contrast, age was not signifi- 
cantly related to learning in the control condition. These conditions-differen- 
tiating correlational patterns, along with predicted variance differences, lend 
support to the developmental imagery hypothesis. 


One of the most dependable ways to im- 
prove people's learning of arbitrarily paired 
nouns or objects is to present the items in the 
context of a meaningful interactive picture, 
or imposed pictorial elaboration to combine 
Rohwer's (1973) and Levin's (1976) termi- 
nology. This statement can be safely gen- 
eralized across virtually all individual-dif- 
ferences variables, since it has been found to 
hold for individuals of both sexes, of widely 
different mental abilities, from disparate 
Social classes or ethnic groups, and the like. 
With specific regard to age or development, 
the statement appears warranted at least for 
individuals between the ages of 3 (Jones, 
1973) and 70 (Treat & Reese, 1976). 

In contrast, whereas people older than 4 
or 6 years of age are quite adept at generating 
analogous facilitative pictorial interactions 
Internally as visual images (induced picto- 
rial elaborations), children younger than 
this are typically unsuccessful at adopting 
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such an induced imagery strategy when in- 
structed to do so (see Levin, 1976; M. Pres- 
sley, 1977). Between the ages of 5 and 10, 
increases in the benefits accruing to an in- 
duced imagery strategy have been noted 
(Levin, Davidson, Wolff, & Citron, 1973; 
Pressley & Levin, 1977; Wolff & Levin, 1972; 
Eoff & Rohwer, Note 1). 

To date, however, it has not been possible 
to claim that the greater effectiveness of the 
imagery strategy in older, as opposed to 
younger, children represents a strict devel- 
opmental increase. This is because formal 
schooling in this country begins at about 5 
years of age, and it may be that increases in. 
imagery strategy effectiveness are attribut- 
able to educational rather than develop- 
mental variables per se (see M. Pressley, 
1977; Reese, 1974). This study was con- 
ducted to disentangle the contributions of 
age and educational experience to the de- 
velopmental imagery phenomenon. In 
particular, children between about 5 and 6!/; 
years of age with either some or no formal 
schooling were selected to ferret out the in- 
dependent contributions of age and amount 
of education during this apparently critical 
imagery-transition period. A 


Method 


Subjects and Design 


Separate samples of rural kindergarten children were 
individually tested in the early fall and late spring of the 
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school year. In the fall, 60 children were randomly as- 
signed $ two experimental conditions and tested at that 
time. Another 65 children from the same classrooms 
were randomly assigned to the two experimental con- 
ditions for spring testing. It is important to note in 
subsequent interpretations that none of the children 
were repeating the grade and no special admission 
policies (such as those based on precocity) were em- 
ployed by the school. At the time of fall testing, the 
children’s age range was 4 years 11 months to 6 years 0 
months. In the spring the children ranged in age from 
5 years 6 months to 6 years 7 months. Within each 
testing period subjects were administered a paired- 
associate learning task under either induced imagery 
(imagery) or regular (control) instructions. 


Materials and Procedure 


Subjects were presented with pictorial paired asso- 
ciates (paired line drawings of common objects). Im- 
agery subjects were told to make up a mental picture of 
the two objects in each pair doing something together 
in order to remember that they “go together.” Control 
subjects were told to try hard to remember the objects 
in each pair, but they were provided with no specific 
learning strategy. This type of control procedure is the 
one typically used in studies of children’s associative 
learning (M. Pressley, 1977; Rohwer, 1973). 

‘Two practice items (cat-apple and shoe-basket) 
were presented to each subject along with instructions 
appropriate for the subject’s condition. The practice 
test consisted of presenting the subject with a picture 
of a cat and then a picture of ashoe. For each of the 
stimulus pictures, the subject was required to select the 
correct response picture from a two-item array (con- 
sisting of a basket and an apple). All subjects seemed 
to comprehend the instructions and what they were 
required to do in the task. 

During the actual study trial, each subject was pre- 
sented with 12 stimulus-response pairs of pictured 
objects. "The pairs were presented at a 12-sec rate, in 
a different random order for each subject. The test was 
given immediately after presentation of the 12 pairs, 

and consisted of a presentation of each of the 12 stim- 
ulus items (in different random order for each subject). 
The subject was required to point to the response item 
with which each stimulus item was originally paired 
during the study trial. The 12 responses were displayed 
in a 3 Item X 4 Item array. The position of each re- 
sponse item in the array was randomly determined, but 
only one array was used throughout the study. The 
subject was given up to 15 sec to respond to each item 
during testing. 


Results 


The mean number of correct recognitions 
(out of 12), as a function of experimental 
condition and time of testing, is presented in 
Table 1, along with the corresponding stan- 
dard deviations. In both testing periods, 
imagery subjects outperformed control 
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Table 1 a 
Summary Data by Experimental Condition 


and Time of Testing 0 


Summary Fall Sprin; 
measure Imagery Control Imagery Control 
N 31 29 32 33 
M 4.90 2.21 6.53 3.18 
SD 3.73 2.42 3.49 2.05 


subjects (both £s > 3.29, ps < .01). That 
imagery facilitation was not uniformly pro- 
duced across all subjects in the imagery 
condition is revealed by an interesting 
characteristic of the associated distributions, 
however. In both testing periods, two dis- 
tinct groups of imagery subjects were evi- 
dent—those who appeared to benefit from 
the imagery strategy and those who did not. 
Bimodality of this sort was not manifested 
in the control distributions, and because of 
this, imagery subjects exhibited greater 
variability in both testing periods (both Fs 
> 2.36, ps < .05). It is worth noting that 
increased variability attributable to bim- 
odality is typical of what can be expected 
during a cognitive-developmental transition, 
in which large individual differences in the 
ability to execute a particular cognitive 
strategy are exhibited (Levin, 1976; Wohl- 
will, 1973). 

There are several reasons why one should 
accept the variance differences between 
imagery and control subjects as real, rather 
than as artifacts produced by “floor” effects 
in the control condition. First, and most 
importantly, there was sufficient variation 
in the control learning data for significant 
relationships to emerge with at least one ? 
the individual-differences variables selec 
for this study, as will be seen shortly. Sec 
ond, if the scores in the control condition 
were artificially depressed (especially during 
fall testing), then the mean increase from 
to spring in that condition would certainly 
be accompanied by a variance increase ^ 
itis not (see Table 1). Finally, even thot 
some skewness was apparent in the contr 
condition, as mentioned earlier the Me 
ability differences were likely attributa E 
to the bimodal character of the imag® 
distributions. 


DEVELOPMENTAL IMAGERY HYPOTHESIS 


To isolate the respective contributions of 
age and school experience to learning per- 
formance in the imagery and control condi- 
tions, separate multiple-regression analyses 
were conducted. This approach was se- 
lected on the basis of its greater statistical 
power in comparison with the usual analy- 
iof -variance approach in which age would 
comprise a factor containing two or more 
levels (Cronbach & Snow, 1977). The ap- 
proach was chosen also because of its ap- 
propriateness for evaluating separate cor- 
telational patterns within experimenter- 
manipulated conditions (Ghatala, Levin, & 
Subkoviak, 1975). The two major predictor 
variables consisted of the age of the child and 
the amount of formal school experience (fall 
=none or 0, spring = some or 1) when tested. 
The contributions of these variables were 
examined after removing the effects of the 
thild's general ability as reflected by teach- 
ers’ ratings (on a 5-point scale) of their stu- 
dents’ academic achievement—ranging from 
poor or failing (1) to exceptionally good (5). 
The major question of interest was whether 
Age would predict imagery subjects’ learning 
Performance independently of amount of 
Schooling and general ability. 

Based on the two multiple regression 
analyses, straightforward conclusions were 
tached for subjects in the two conditions. 
First, there definitely was a relationship 
between the three predictor variables (rated 
general ability, age, and school experience) 
and learning performance in both conditions. 
For control subjects, R = .46, F(3, 58) = 5.29, 
P<.01. For imagery subjects, R = .52, F(3, 
59) = 7.31, p <.01. And in both conditions, 
tated general ability was significantly related 

performance (partial rs = .40 and .35 in 
the control and imagery conditions respec- 
tively, both ps «.01). Thus, in both condi- 
lions, children whom teachers perceived as 
Sood learners tended to perform best on the 
learning task. Note that these significant 
'orrelations (which are of about the same 
Magnitude) suggest that there was indeed 
‘Sufficient performance variability among 
| 'ntro] subjects for relationships to emerge, 
| " Present, 
|. Apart from the similar correlations be- 

Ween general ability and learning that were 
¡“ted in both conditions, differentiated 
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correlational patterns were also obtained. 
In the imagery condition, age was signifi- 
cantly related to performance (partial r = 
.26, p < .025), whereas in the control condi- 
tion it was not (partial r = .03, p > .20). In 
neither condition was amount of school ex- 
perience (time of testing) a significant pre- 
dictor when the other variables were con- 
trolled (both ps > .05). 


Discussion 


The pattern of results accords well with 
earlier cross-sectional findings that imag- 
ery-generation competence develops with 
age. The present data are consistent with 
the conclusion that the early development of 
the ability to profit from an associative- 
learning induced imagery strategy is more a 
function of maturation than educational 
experience. This conclusion is based on the 
finding that age emerged as a predictor of 
learning performance only within the im- 
agery condition. If learning performance 
just generally improved with maturation, 
then age would have predicted performance 
in both the imagery and control conditions, 
which it did not. Two additional comments 
strengthen this interpretation: First, gen- 
eral ability was related to performance 
among control subjects, thereby demon- 
strating that there was variance in that 
condition that could be explained. The 
second comment is based on an as yet un- 
mentioned analysis of the data for fall 
subjects only (i.e. those for whom formal 
instruction had not yet taken place); Con- 
sistent with the partial correlation results, 
age was significantly related to performance 
in the imagery condition, but not in the 
control condition. These findings and those 
showing no relationship between educational 
experience per se and learning in the imagery 
condition should lay to rest any speculations 
that the development of imagery-strategy 
competence during this transition period is 
educationally based (M. Pressley, 1977, 
Reese, 1974). This is not to deny the possi- 
bility that experiential factors other than 
schooling may have contributed to the 
present pattern of results. Certainly age 
covaries with many experiential variables 
(see Bandura, 1977; Wohlwill, 1973), and 
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Under cued and noncued conditions in three sessions held 1 week 
of 28 female students (18-25 
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passages 
age-related differences in recall and whether living an eta facilitated 


the eM recall of events of that 
were no overal significant age differences 
era may alightly help recall. anger ns 
54) = 11,46, p < .006, indicated 

particularly of the more difficult prose 
ences in recall after 1 week on any of the 


Do adults of different ages differ in their 
ability to learn and recall verbal material? 
With the current em on 
tducation, return to full-time education after 
taising a family, and mideareer change in 
ecupation, this question is more relevant 
than it was even a decade ago. The fow 
Mudies that have compared different - 
toborts have left us with answers that are 
from clear. 

Studies to date that addressed this ques- 


lain conditions. 
(1966) compared the performance of adults 
by decade from age 20 years to 70 years on 
recall and recognition of word lists. 
steady decline in free recall 
With increasing age but no age-related dee- 
t in 
"pared a group of mean age 20 years to 
"he of mean age 76 years on recall of a cate: 
en — 
did significantly better 
"all, but that the difference virtually dis- 
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The many studies of age-related intellec- 

tual functioning and the popular interpre- 
tation of those studies certainly will not as- 
suage the fears of mature people. Schaie 
(1974) has assailed the conception of a uni- 
versal decline across all intellectual abilities 
as a myth, but Horn and Donaldson (1976, 
1977) have attacked the assailing on both 
theoretical and empirical grounds. In a 
lively debate, Baltes and Schaie (1976; 
Schaie & Baltes, 1977) have vigorously de- 
fended their position. Both sides present 
solid arguments; the issue is far from settled. 
Yet, it must be noted that Schaie and Baltes 
state clearly that there is some decline in 
some abilities at least by the seventh, eighth, 
and ninth decades of life, and that Horn and 
Donaldson state that any decline before the 
seventh decade is in abilities not very closely 
related to the abilities that predict success 
in school work. Both sides to the recent 
controversy would concede that the middle 
aged have not declined since their youth in 
whatever intellectual abilities correlate 
highly with success in school. Measures that 
are essentially IQ are not, however, direct 
measures of rate of learning. 

It is conceivable that middle-aged people 
might outperform young adults in recalling 
recently learned material if the material were 
meaningful prose. Bower (1976) proposed 
that adults, as a result of experiencing nu- 
merous stories over a lifetime, might have 
acquired an abstract framework that would 
aid them in interpreting and constructing 
new stories and thus lead to better under- 
standing and recall of stories. Likewise, life 
experiences may provide such a framework 

for learning and recalling any meaningful 
prose. 

The study herein presented compares the 
performance of young women (18 to 25 years 
old) to that of mature women (35 to 55 years 
old) in recalling two just-learned historical 
Passages. One passage dealt with an era 
before either group was born; the other 
passage with an era through which the ma- 
ture women had lived but before the young 
women had been born. 

This study differs from previous devel- 
opmental comparisons in five ways: (a) All 
participants were women currently or re- 
cently students in higher education (women 
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constitute a large proportion of the older 


student population in higher education); (b) ' 


meaningful prose, approximating material 
found in college textbooks, was used because 
of its relevance to real-life academic de- 
mands; (c) the effect of life experience, that 
is, having lived through an era, on learning 
was assessed; (d) both an immediate and a 
one-week delayed recall permitted an age- 
related comparison of retention; and (e) cued 
recall of meaningful prose was compared to 
noncued or free recall. In addition, the ex- 
tent to which the judged importance of dif- 
ferent parts of the passage affected retention 
at either age level was addressed. 


Method 


Subjects 


A total of 62 female subjects agreed to participate in 
the study, 30 in the 18-25-year age group and 32 in the 
35-55-year age group. Two subjects from the 18-25- 
year age group who did not complete all recall tasks were 
eliminated from the final analysis. Both, when con- 
tacted, said they simply forgot to come for a session. 
Their initial scores suggest they were not atypical in 
learning and recall from the rest of their cohort, Four 
subjects from the 35-55-year age group, 2 who did not 
complete all recall tasks and 2 who did not fall within 
the age criterion, were also eliminated from the final 
analysis. Of the 2 who did not complete all tasks, 1 
missed the last session because of an out-of-town family 
illness; the other stated during the second session be 
she did not like the task. What scores were obtaine! 
were typical of the cohort. Of those outside the ag? 
range, 1 was below 35 and 1 above 55 years old, but we 
preferred not to exclude them from the sessions. Thus, 
the final sample consisted of 28 women in each group. 
Though one always worries about attrition problems 
the small rate of attrition would not seem to distort the 
obtained data. There are always selection probleme 
The volunteers in both cohorts responded to a genu 
request made in their schools. Since most were i ; 
school or recently had been, the results can be gener 
ized only to adults attending school. m 

All subjects in the 18-25-year age group were ide- 
rently students in higher education or had e 
grees within the previous 18 months. Four hel a 
degrees, while 24 were BA candidates. Of the 4 hol is 
the BA, 1 was an MA candidate and 3 were not in à 20t 
gree program. Of the 24 candidates for the BA, itis ei 
known (often not by the individual) whether they W0 
attend graduate school. The mean age of this group A 
20.7 years. Twenty-four subjects in the 35-55-yea A 
group were currently students in higher educatio! 
had earned degrees within the previous 12 
Twenty-one held BA degrees, while 6 were BA 
dates. One was not a degree candidate and z r 
degree. Of the 21 holding the BA, 5 were ma 


' 


idates and 8 were nondegree students; 3 had re- 
earned an MA and 5 a BA Comparability of 
was determined more by direct measure of 
ity (Vocabulary subtest of the Wechsler Adult In- 
nce Scale [WAIS]) than demographic informa- 
ince the older group was a bit more highly edu- 
The mean age of this group was 45 years. 


Materials 


ssages. The prose passages used in this study were 
ected to typify materials found in introductory col- 

textbooks. Two passages recounting historical 
ts were condensed and paraphrased from the The 
ericana Annual 1957; An Encyclopedia of the 
vents of 1956 (s.v. “Suez Canal”), An Encyclopedia 
World History (2nd. ed., s.v. "Russia, 1801-1914"), 
The New I nternational Yearbook: A Compendium of 
the World's Progress for the Year 1910 (s.v. Finland"), 
and Midgley's (1965) Social Studies/American Histo- 


ry. 
One passage described the Suez crisis in 1956. (The 
older subjects had lived through that. historical period 
s adolescents and young adults.) The other passage 


Both passages were independently divided into idea 
nits by the second and third authors, who served as 


author. The Suez passage was divided into 171 
nits that included 38 three-point, 36 two-point, 
97 one-point ideas. The Finnish passage was di- 

into 164 idea units that included 41 three-point, 
two-point, and 79 one-point ideas. The had 
e divided into idea units for purposes of quantifi- 
Weighting was necessary because it could 


more important information, but they might have 
er nonweighted score because they retrieved less 
al information. 
_ Comprehension questions. A set of 12 short-answer 
nprehension questions was prepared for each of the 
0 passages for use ina cued recall task. F $ 
Biographic questionnaire. A biographic question- 
Maite, divided into two parts for use at two experimental 
@sions, covered the following aspects of subject 
EC kground: age, previous and current education, 0c- 
"üpation of parents and spouse, previous study of his- 
; and prior knowledge of the events of the two pas- 


Sages, 

WAIS Vocabulary subtest. The Vocabulary subtest 
the Wechsler Adult Intelligence Scale was anni 
in written form in order to verify comparable 
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levels of verbal aptitude among the two age groups. 
Both groups scored over one standard deviation above 
the mean. The mean for the mature women was 14.5 
(SD = 1.53); for the young women, the mean was 14.0 
(SD = 1.91). A comparison of mean scaled scores rev- 
ealed no statistical significant difference between 
groups (t = 1.08, p > .05). The Vocabulary subtest was 
used because it correlates highly with the full scale. 


Procedure 


The subjects were tested in small groups at three 
experimental sessions occurring exactly 1 week apart. 
The order of the passages was counterbalanced. 

Session 1. Folders containing the materials used in 
Session 1 were distributed randomly, with half of the 
subjects receiving the Suez passage and the other half 
receiving the Finnish passage. 

The subjects were told that they would have 20 
minutes to learn the passage. They were encouraged 
to underline or take notes as desired and were informed 
that they would subsequently be asked to recall in 
writing the content of the passage and to answer some 
specific questions about it. 

‘At the end of the 20-minute study period, the subjects 
spent 5 minutes filling out the first part of the biogra- 
phic questionnaire. ‘They were given two sheets of lined 
paper and requested to recall as much of the passage as 
they could remember, using any organization and in- 
cluding as much detail as possible. They were told that 
they could have unlimited time for the free-recall task. 
Immediately after completing the free-recall task, each 
subject wrote answers for the 12 comprehension ques- 
tions based on the passage she had. learned. Unlimited 
time was allowed for this cued-recall task, Before their 
departure, the subjects were requested not to review or 
discuss the passage during the subsequent week. 

Session 2. Upon arrival, the subjects were allowed 
20 minutes to recall in writing the passage they had 
learned the previous week. "The subjects were satisfied 

that the time was sufficient. The remainder of Session 
2 followed the procedure of Session 1, with subjects 
learning the historical passage they had not previously 
studied. Following a 20-minute learning of the new 
e, they completed the second part of the biogra- 
pis p sso of Session 1 
were identical to jon 1. 

Session 3, Asin eae 2, the — vee p 
inutes to recall in writing the passage the) 
d the previous week. This delayed recall task was 
followed by the administration of the vocabulary sub- 
test of the WAIS. ‘The subjects were also invited to 
write comments about their own learning and recall of 


the two passages. 


Scoring and Interrater Reliability 


ber of unweighted and weighted idea units 
vealed French task on each passage was tallied for each 
subject. Half of the recall y 
of two scorers, with counterbalancing for age group, 
and type of recall task. — j 
A measure of interrater reliability was obtained by 
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randomly selecting 29 protocols, distributed approxi- 
mately evenly among the three types of recall task, for 
independent scoring by both scorers. The correlation 
between these independent scores was .99 for both un- 
weighted and weighted values. 


Results and Discussion 


Cued Recall 


The 12 comprehension questions required 
short answers, each of which could contain 
several idea units. Scorers set criteria for 
both weighted and unweighted scores. The 
answers were scored independently by the 
two scorers. Note that correlations for both 
weighted and unweighted scores between the 
scorers was .99. 

A comparison of mean scores, both un- 
weighted and weighted, for comprehension 
questions (cued recall) revealed no differ- 
ences between age groups for either passages. 

On the Finnish passage, M = 55.8 (SD = 
13.4) for the younger, and M = 54.0 (SD = 
17.6) for the older when weighted scores were 
used. Respective means were 51.7 (SD = 
12.7) and 52.9 (SD = 13.2) on the Suez pas- 
sage. With unweighted scores, the compa- 
rable means were 25.1 (SD = 6.3) versus 24.5 
(SD = 8.2) and 27.2 (SD = 6.8) versus 27.7 


(SD = 6.8). These data were not analyzed 
further. 


Free Recall 


Table 1 contains the unweighted means 
and standard deviations for immediate and 
delayed recall for each passage and across 
passages. The correlation between total 
ünweighted and weighted scores for imme- 


Table 1 
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diate and delayed free recall was found tobe 
.99; therefore, weighted scores, representing * 
idea units ranked by importance, are not 
displayed and were not further analyzed. 
The relationship between weighted and 
unweighted scores indicates that people do 
not differ in ability to retrieve important and 
all information. 

The three-way mixed-design analysis of 
variance for unweighted values, using age 
group (younger or older) as a between- 
subjects variable and passage (Suez or 
Finnish) and time of free recall (immediate 
or delayed) as within-subjects variables, 
showed significant main effects for passage, 
F(1, 54) = 156.89, p < .001, and time of re- 
call, F(1, 54) = 310.48, p < .001. Of greater 
interest to the study was the finding of no 
main effect for age group. That is, overall, 
the older and younger women did not differ 
significantly in the amount recalled. 

"There was an Age Group X Time of Free 
Recall interaction, F(1, 54) = 11.46, p <.005. 
Inspection of means in Table 1 seemingly 
leads to the conclusion that the younger 
women lose relatively more than the mature 
across the 1-week interval. None of the 
is interactions was significant (a = 
.05). 

Since an initial question asked if life ex- 
perience, that is, living through an era, woul 
have an effect on learning, t tests on the 
means between age groups were performed 
for each passage on each recall. There were 
no statistically significant differences be- 
tween the mature and the young on either 
recall of the Suez passage nor on the delayed 
recall of the Finnish passage. The young 
recalled significantly more than the mature 


Means and Standard Deviations of Idea Units Recalled in Unweighted Raw Scores 


Suez passage innish passag g ss passages 
Age group n M SD ia 30 a rami = D 

i Immediate free recall 

5 years 98 — 8014 20.91 63.04 16.87 71.50 2071 
35-55 years 28 72.50 24.26 51.79 21.57 62.14 25.03 

Delayed free recall 

18-25 years 28 ^ 5521 16.60 40.32 11.55 47.77 16.04 
35-55 years 28 54.89 22.65 37.11 17.24 46.0 uum 


Note. For th: E il ible was 
Not e Suez passage, the maximum score possible was 171; for the Finnish passage, the maximum score possible 


f 
| ADULT 


on the immediate recall of the Finnish pas- 
sage (t = 2.17, p < .05). 

Because the passages were slightly dif- 
"ferent in length, the scores for each passage 
"were standardized by passage using the z 
‘score transformation (see Table 2). This 
transformation eliminated the fairly unin- 
teresting main effect due to passage, of 
course; but as a result of the transformation, 
any concern about contamination coming 
from differences in passage length or diffi- 
culty of passage was removed. The analysis 
of variance performed on the z scores yielded 
the expected main effect for time of recall, 
F(1, 54) = 224.46, p < .001, and the same Age 
Group X Time of Recall interaction, F(1, 54) 
= 5.60, p < .02, as was obtained with the 
nontransformed data. Multiple ¢ tests 

mparing age groups on each passage for 
ach recall again revealed a statistically 
significant difference between age groups 
only on the first recall of the Finnish passage 
(t = 2.11, p < .05). 

_ The principal finding was the absence of 
asignificant main effect for age group. Al- 
though the mean recall scores appeared to be 
slightly lower for the older group—an ob- 
servation consistent with previous experi- 


prose material, the overall performance of 
the older group was not significantly differ- 
ent from that of the younger group. 

The significant main effect for passage 


indicated that the two prose passages dif- 
effect for time of 


ected difference be- 


Tecall indicated the exp 
delayed recall. 


tween immediate and 


Table 2 


Idea Units Recalled in Standardized Scores 


: leans and Standard Deviations of 
passag Average across passages 
M 


: Suez e 
Age group n M SD 
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comparison of mean recall scores in Table 1 
suggests that the Finnish passage was more 
difficult than the Suez passage for both age 
groups (the difference in means is far greater 
than the difference in passage length) and 
that immediate recall was superior to de- 
layed recall for both age groups. 

"There are several possible explanations for 
the fact that the significant age decrement 
reported in previous studies was not sub- 
stantiated here. First, both age groups were 
composed of students, and both were cur- 
rently familiar with academic learning. 
Second, neither group was put under a time 
constraint for recall tasks. (Although only 
20 minutes were allowed for delayed recall, 
almost all subjects of both age groups said 
that this interval was sufficient.) Third, the 
experimental material consisted of prose 
passages rather than the lists of discrete 
items used in many previous developmental 
comparisons. 

A significant Age Group X Time of Free 
Recall interaction indicated that recall per- 
formances for immediate and delayed free 
recall differed among the two age groups. A 
greater difference was found between the 
two age groups on immediate than on de- 
layed free recall, with the higher scores at- 
tained by the younger group. A possible 
explanation for this finding is that the 
younger subjects were superior in the ac- 
quisition of new material. There is some 
previous experimental evidence (Rabbit, 
1968; Talland, 1968) for the notion that older 
adults do not acquire new information as 
rapidly or efficiently as younger adults. 
Since learning time was limited in the 
present study, the younger group may have 
experienced an advantage In speed or effi- 


RECALL 


(z scores) 


SD 


Finnish passage 
M SD 


‘Immediate free recall 
18-2 6l .88 5 86 68 86 
119-25 years 28 $ 18 1.10 24 1.06 
(85-55 years 28 29 1.02 $ 
N Dely HT m. 59 -42 64 
E 2 ae 95 -—e 81 —.57 89 
255 years 28 —46 9 -l 
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ciency during the acquisition stage, which 
was reflected in a superior performance on 
immediate recall. An alternate explanation 
is that acquisition was the same in the two 
groups, but during the 5 minutes between 
learning and recall, more of what was learned 
became nonaccessible to members of the 
older group. The two explanations cannot 
be separated in this experiment. Theoret- 
ically, the distinction may be interesting; 
educationally, it would seem immaterial at 
this stage of our knowledge. 

More relevant for real academic learning 
is the finding that recall among the two age 
groups was similar after a week’s delay. 
Since real-life learning situations seldom 
require immediate recall of learned material, 
delayed recall may be a more important in- 
dicator of the academic performance of older 
students. Although younger students may 
be better *crammers," older students can 
remember as effectively over time. 

The difference on immediate recall of the 
Finnish passage and lack of difference with 
the Suez passage revealed by the t tests must 
be interpreted cautiously. It is intuitively 
appealing to believe that living through an 
era gives one an episodic memory (Tulving, 

1972) that can serve as a general frame of 
reference for organizing information about 
that era during storage and thus facilitate 
retrieval. This may be the case here. Epi- 
sodic memory, then, would have given the 
mature an edge that counterbalanced the 
young’s slightly greater ability to retrieve 
just-learned material immediately after 
learning. However, it isa momentary edge, 
just as the young’s better immediate re- 
trieval is a momentary edge. A week later, 
there was no difference on either Passage. 
An alternate explanation is that the young 
can do slightly better on immediate recall 
with more difficult material. If so, it again 
is not an educationally significant advantage 
because it is gone in a week. 

Many mature women are anxious at the 
thought of returning to college. If the 
question is “Can I still compete and learn as 
fast?” the answer is a definite “yes.” Cer- 
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tainly, the results here indicated that it is 
“yes” at least after the considerable practice 
in university situations experienced by the 
mature women in this study. 
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Self-Esteem in Open and Traditional Classrooms 
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Self-esteem of 350 seventh-grade students from open and traditional elemen- 
tary schools was measured according to the Coopersmith Self-Esteem Inven- 
tory and corresponding Coopersmith Behavior Rating Form. The seven open 
and eight traditional classes used in the study were rated according to the 
Walberg-Thomas Scales and were found to be significantly different treat- 
ment groups. No differences in self-esteem were found between the groups, 
however, nor were main effects found for the factors socioeconomic status, IQ, 


Extensive literature on the open class- 
oom format is accumulating (e.g., Barth, 
972; Dennison, 1969; Featherstone, 1967, 
1971; Kohl, 1969; Rathbone, 1971; Silber- 
man, 1970). Unfortunately, only a small 
part of this literature includes results of 
objective investigations of the advantages 
‘and disadvantages of this format. Although 
these studies have provided a foundation for 
further research, the findings have been in- 
Consistent (see, e.g., Blumenthal & Reiss, 
1975; Minuchin, Biber, Shapiro, & Zimiles, 
1969; Wilson, Stuckey, & Langevin, 1972; 
McPartland & Epstein, Note 1). Such in- 
Consistencies are particularly pronounced in 
Studies on the affective impact of open ver- 
sus traditional formats. 

_ Discrepancies in claims and findings may 

e due, in part, to differences in (a) the 
Characteristics of the open environments 
Under study, (b) the type of affective be- 
"havior being measured, or (c) the experi- 
"mental design and data analysis. 
„Concerning the “openness” issue, various 
“definitions have encompassed architectural 
lesign, time scheduling, or teacher philoso- 
Dhy. Perhaps some comparability of find- 

s would be possible if common measures 
f openness were used in these studies. The 
lalberg-'Thomas Scales (Walberg 


Requests for reprints should be sent to Wendi H. 
ass, who is now at the Diagnostic Teaching Center, 
Denver Public Schools, 3050 Jackson Street, Denver, 
[lordo 80205. 
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or sibling position. The factor sex did show significant difference. "These 
findings indicate that skepticism is necessary regarding claims that the open 
school format fosters self-esteem. Additional data that use consistent mea- 
suring instruments and vary subject populations are needed. 


Thomas, 1972) could provide such a mea- 
sure. Nevertheless, open school settings 
vary widely in actual practice. This situa- 
tion threatens the external validity of any 
research in the field but is unavoidable at. 
this time. 

As the dependent variable, self-esteem 
could represent affective state. Cooper- 
smith (1967) held that self-esteem refers to 
the self-evaluation that the individual makes 
and customarily maintains, derived in part 
from the individual's interactions with the 
environment and from the amount of re- 
spectful, accepting, and concerned treatment 
the individual receives. 

The type of self-esteem being measured 
can influence the results. Franks, Marolla, 
and Dillon (1974) found significantly higher 
self-esteem scores in an open school group 
using competency-based measures, Neither 
Ruedi and West (1973), nor Allen (1974), nor 
Wright (1975), however, found evidence of 
self-esteem differences between the two 
treatment groups (open vs. traditional). ! 

A favorable attitude toward school is 
sometimes cited as a result of the open 
classroom, and evidence showing gains in 
this area has been presented by Tuckman, 
Cochran, and Travers (1974) and by 
Groobman, Forward, and Peterson (1976). 
The latter study investigated self-esteem as 
well but found no significant differences. 
Favorable school attitude may be valued in 
its own right, but it is not the same as self- 
esteem and should not be confused with it. 


5-0701$00.75 


702 


Our research investigated the question 
“Do children from open school environments 
show greater self-esteem?” The design of 
the study differs from previous research in 
that the independent variable (open class- 
room) was operationalized, and several 
possible confounding variables were made 
factors in order to isolate to the greatest 
possible extent the relationship between 
open school environment and self-esteem. 


Method 


Subjects 


Subjects were 350 seventh-grade students from 
Westminster, Colorado, a suburb of Denver. All were 
attending a traditional format school but had come the 
previous year from five different elementary schools, 
two of which were open format and three of which were 
traditional format. All subjects had been exposed to 
their respective treatments for at least one academic 
year. 


Materials 


The independent variable was represented by 15 
sixth-grade teachers, each of whom was rated using the 
Walberg-Thomas Scales (Walberg & Thomas, 1972). 
‘The scales consisted of a 50-item Teacher Questionnaire 
and a 50-item Classroom Observation Scale, designed 


1 were obse 
analysis of variance (ANOVA) showed them to be sig- 
nificantly different. groups (see Results section). 
The dependent variable, self-esteem, was ra- 
tionalized as score on the Coopersmith Self-Esteem 
Inventory (self-report version only; Coopersmith, 1967). 
This scale consisted of 50 descriptive statements to 
which the subjects responded by checking “like me” or 
Bite kaas plus a similar 8-item Lie scale embedded 
within the inventory. The corres nding Coopersmi 
Behavior Rating Form was also picis as alt mid 
it was not analyzed as part of the dependent variable 
because we defined self-esteem as phenomenological, 
making external evaluation irrelevant. The form 
consisted of 13 descriptive statements with which the 
homeroom teachers rated their students on a scale of 
1-5 as to frequency of exhibiting behaviors mentioned, 
(See Results section for data on this instrument from 
the present research and from Coopersmith.) 
A Socioeconomic status (SES) was measured by a 32- 
item self-report instrument developed at the Labora- 
tory of Educational Research, University of Colorado, 
and used by permission. This instrument is discussed 
in White and Hopkins (Note 2). 


Procedure 


The 15 sixth-grade teachers representing the inde- 
pendent variable were chosen because a majority of the 


WENDI H. KLASS AND STEPHEN E. HODGE 


subject population had been in their sixth-grade classes 
during the previous school year. The Walberg-Thomas 
Scales were completed with these 15 teachers, and sig. 
nificant differences between open and traditional 
groups were established. 

The Coopersmith Behavior Rating Forms were then 
distributed to the current homeroom teachers (seventh 
grade), who were asked to complete one form for each 
of their students. When these were returned, the 
teachers received copies of the Coopersmith Self-Es- 
teem Inventory, which they then administered to their 
students in a group, along with the previously men- 
tioned SES measure. 

This procedure prohibited teachers’ responses to the 
Behavior Rating Form from being influenced by stu- 
dents’ responses to the Self-Esteem Inventory. It also 
avoided experimenter bias, since these seventh-grade 
teachers did not necessarily know which elementary 
school group (open or traditional) the students were 
from and since the experimenter had no contact with 
students during the administration. Finally, the pro- 
cedure assessed the longer term, and potentially more 
important, effects of the open classroom format on 
students’ self-esteem. 

A randomized block design was used with sex, 1Q, 
SES, and sibling position as factors, in addition to 
treatment group. Information on the factors other than 
SES was taken from the students’ cumulative folders. 
Socioeconomic status was divided into three levels, with 
fairly equal n, based on the scores from the University 
of Colorado instrument (percentages in the high, mid- 
dle, and low groups were 36%, 36%, and 28%, respec- 
tively). IQ was divided into high, medium, and low 
according to Previously administered group test data. 
Sibling position was also grouped into three levels: 
oldest/only, youngest, or neither. 4 

A series of two-way ANOVAs was conducted, crossing 
each factor individually with treatment. An SPSS 
(Statistical Package for the Social Sciences) program 
(Nie et al., 1970) was used. Factors were not crosse 
with each other because although there was a large 
enough number of subjects, there was also a large 
enough number of factors to result in low cell frequen- 
cies and possibly empty cells. Thus, possible interac- 

tion effects between factors are not known. The anal- 
ysis did yield main effects for each factor and the pos- 
sibility of interaction effects with treatment. 


Results 


The present study attempted to deter- 
mine if a difference existed between the 
self-esteem scores of children who had been 
In an open school setting and the scores of 
those who had been in a traditional school 
Setting. Evidence of a significant difference 
between the two groups was not found. 
Analysis of variance, with score on the C00- 
persmith Self-Esteem Inventory self-rating 
form as the dependent variable, showed no 
main nor interaction effects for treatment 
group. 


No main nor interaction effects were 
‘found for the factor socioeconomic status, 
| (2, 265) = .058, p < .999, nor for SES x 

Treatment, F(6, 265) = .379, p < .999. 

For the factor IQ, neither main nor inter- 
action effects approached significant levels: 
for main effects, F(2, 287) = 1.080, p < .342, 
and for IQ X Treatment, F(6, 287) = .796, p 
< 999. f 

| The two-way ANOVA on self-esteem 
showed no main or interaction effects for the 
or sibling position: for main effects, F(2, 
87) = .049, p < .999, and for Sibling Position 
Treatment, F(6, 287) = .110, p < .999. 

— Main effects were found for the factor sex. 
Is scored significantly higher on the 
oopersmith Self-Esteem Inventory than 
boys, F(1, 293) = 3.508, p € .059. There was 
0 demonstrable interaction effect with 
tment, F(3, 293) = 1.327, p € .265. Sex, 
n, appears to be only factor considered in 
present research that influenced self- 
teem in a significant way. 

"Table 1 shows the mean, variance, and 
ber for both sexes and in the four levels 
eatment (representing 1, 2, 3, and 0 years 
i the open classroom, respectively). 
Homogeneity of variance was tested be- 
Use unequal variances and unequal ns 
ween groups were crossed (larger paired 
smaller) Using Barlett's test, x*(7) = 
2, indicating homogeneity of variance. 


l'est Analyses of the Instruments 


4) was computed on the Walberg- 
mas Scales, the SES measure, and the 
Oopersmith Self-Esteem Inventory in order 
Judge effectiveness of the instruments, 
ve reliability scores, and view the dis- 
tions of the data. 
rational definition of the independent 
Mable, the open classroom, was score on 
? Walberg-Thomas Scales. (Open and 
itional classes should score on opposite 
f a continuum delineated by this in- 
ment.) For purposes of the analysis, the 
tems were grouped into their eight 
scales as specified by the authors 
alberg & Thomas, 1972). A Hoyt inter- 


SELF-ESTEEM IN OPEN AND TRADITIONAL CLASSROOMS 


703 
"Table 1 
Two-Way Analysis of Variance for Sex X 
Level of Treatment 
Data Level of treatment. 
characteristic 1 2 3 4 
Male 

M 55.3 63,3 68.8 64.0 

o? 272.0 273.2 3103 265.5 

n 9 31 13 101 

Female 

M 69.3 6836 629 67.3 

o? 175.0 209.5 3211 234.2 

n 9 28 9 101 


Note. For levels of treatment, 1 = 1 year in the open setting, 
2 = 2 years in the open setting, 3 = 3 or more years in the open 
setting, 4 7 traditional school setting throughout elementary 
school. 


nal consistency reliability correlation for the 
Teacher Questionnaire was .92 and for the 
Classroom Observation Scale was .94. 
Correlations of each of the eight subscales to 
the total test score ranged between .638 and 
.977 for both the Observation scale and the 
"Teacher Questionnaire, with the exception 
of one subscale that did poorly on both for- 
mats and another subscale that was impos- 
sible to rate by observation alone. Walberg 
and Thomas (1972) did not offer comparable 
data, so no comparisons can be made. 

Table 2 shows the differences in mean 
openness scores between the two types of 
classrooms and the two methods of rating, 
using ANOVA. (The higher the score, the 
more closely the classroom fits the open 
school model of education.) For both Ob- 
servation and Questionnaire ratings, the 
open classrooms received significantly higher 
scores than the traditional classrooms, F(l, 
20) = 6.447, p $.019. An additional inter- 
esting finding was that the method of rating 
itself approached significance, F(1, 20) = 
3.787, p S .065. ee m teachers qus 

i rated themselves as more oper 
eise ibn scale rated them. This 
difference was especially pronounced in the 
traditional format classes. - Groobman et al. 
(1976) used this scale to differentiate open 
from traditional classes and obtained data 
that were congruent with the present find- 
ings, including the discrepancies between 
self-rating and observer rating. 
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Table 2 
Mean Scores on Walberg-Thomas Scales— 
Classroom Format by Method of Rating 


School 
Rating method Open Traditional 
Self 
M 137.36 116.7 
o? 131.8 301.7 
n 7 5 
Observer 
M 123.0 844 
c? 225.3 127.8 
n 7 5 


Taken together, the two Walberg- Thomas 
scales demonstrated that the children in our 
sample had been exposed to distinctly dif- 
ferent educational formats and that the in- 
dependent variable (the open classroom) did 
exist. (Observation scale data must be in- 
terpreted with some caution, however. 
Daily variations in teacher behavior could 
negatively influence the stability of this 

measure.) 
The dependent variable was score on the 
Coopersmith Self-Esteem Inventory (Coo- 
persmith, 1967). The LERTAP (see Nelson, 
1974) on only the self-report version yielded 
correlations between the 50 individual items 
and the total test of approximately .2—.4. 
While these item correlations seem low, this 
is acceptable because (a) there are many 
items, (b) there are only two options per 
item, (c) the total reliability is high and item 
correlations are homogeneous, and (d) the 
test is an affective measure. Of a possible 
100 points on the test, the mean score of our 
sample was 82.07 (SD = 7.91). (Cooper- 
smith reported his mean score on 1,748 
subjects to be 70.1 for males and 72.2 for fe- 
males with SD = 13.8 and 12.8, respectively.) 
The standard error of measurement on the 
present data was 2.92, with Hoyt reliability 
estimated to be .86. (Coopersmith, 1967, 
reported test-retest reliability at a 5-week 
interval to be .88). Our mean, as stated 
above, was 82.07, with median = 82.10 and 
mode = 79,03. 

Although the factor socioeconomic status 
did not achieve significance, possibly be- 
cause of the homogeneity of the suburb in 
which the children lived, the data fell into a 
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normal-appearing curve with a Hoyt reli- 
ability of .58. The curve had the following | 
characteristics: M = 58.3, Mdn = 58.5, 
mode = 57.5, range = 44-76, SD = 5.79, SE 
= 3.68, and N = 350. 


Discussion 


The research problem we considered was 
self-esteem in the open school—Would ob- 
jective investigation of self-esteem in open 
and traditional format classes yield signifi- 
cant differences between groups? , 

Although we demonstrated that differ- 
ences in format did exist, using the 
Walberg-Thomas Scales as a definition of 
openness, significant differences in self- 
esteem were not found. No main effects 
were found for the factors SES, IQ, or sibling 
position, nor were there interaction effects 
between these factors and treatment group. 
Sex, however, did reach significance, as girls 
had higher self-esteem than boys in this 
sample of seventh-grade children. 

Past research in this area has suffered 
from inconsistencies in treatment and re- 
sults. One problem has been differences in 
characteristics of open classrooms being 
studied, including the possibility of no de- 
monstrable difference between treatment 
groups. The present research has tried to 
overcome that difficulty by using the 
Walberg—Thomas Scales to define and de- 
scribe the concept. The 50-item Classroom 

Observation Scale and the 50-item Teacher 
Questionnaire derive a score for each class- 
room under study, which allows the class 
rooms to be placed on a continuum of o: 
ness and allows testing for significant dit 
ferences between groups. Using analysis E 
variance with the Walberg- Thomas dal : 
and reviewing previous research results, 1 
would seem that reliable differentiation 
between treatment groups can be made ani 
was made in the present study. - bill 

Another source of experimental variab Y 
has been the type of affective behav! it 
measured and difficulties in measuring g 
reliably. Self-esteem was chosen for t a 
research, with the Coopersmith Self-Estee à 
Inventory as the vehicle. It has been E d 
previously in research of this nature, E 
Coopersmith himself has done extens" 4 


validation work on it (see Coopersmith, 
- 1967). 

There is the possibility that the lack of 
significant difference between groups was 
due to the inability of the Coopersmith 
measure to detect actual differences. The 
reliability and validity data from the present 
sample, however, with the favorable LER- 
TAP analysis and the fairly large N (350), 
tend to discount this explanation. Even so, 
since the inventory is a pencil-and-paper 
measure of an hypothesized emotional state, 
the question remains. 

These findings indicate skepticism is 
necessary in viewing claims that the open 
classroom fosters motivation and “happi- 
ness.” In addition, achievement test data 
analyzed in the literature have shown that if 
open school children are learning more, they 
are not better prepared to take achievement 
tests (Lewis & Adank, 1975; Tuckman et al., 
1974). Some writers have implied that open 
school children have better attitudes toward 
school and enjoy it more (Barth, 1972; Sil- 
berman, 1973). If they are more motivated, 
eventually achievement could be expected 
to improve, or self-satisfaction (measurable 
on a personality characteristic such as self- 
esteem) could be expected to increase. 

Before the question of the advantage or 
disadvantage of the open school format can 
be resolved, additional data are needed, 
using consistent measuring instruments and 
varied subject populations. We have spec- 
ified a way to minimize inconsistencies by 
Measuring openness of classroom on a spe- 
cific but wide-ranging set of criteria with the 
Walberg-Thomas Scales. Further explo- 
tations relating particular differences in 


process to differences in outcome are need- 
ed, 
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Primary Mental Abilities at Collective and Individual Levels 


Kjell Har 
University of Góteborg, 


Scores in ability tests administered to pupils in Grades 4-9 were decomposed 
into district, class, and individual components. Demographic characteristics 


of neighborhoods, self-selection between 
phasis in instruction, irregularities in tesi 


the variation between schools and districts. First-order factor analyses at t he 
individual level showed the usual Primary Mental Abilities test pattern. Sec- 


ond-order analyses grouped the primary 


factor. This latter structure was found also in the analyses of class means, al- 


though the processes behind the factors 
operating at the individual level. 


Cronbach (Note 1) has shown many im- 

portant consequences of the choice of level 
of analysis for data from classroom research. 
The choice of overall-, within-, or between- 
groups analysis for regressions, covariance 
adjustments of treatment group means, 
multivariate relations, and so on may lead to 
dramatically different results. Cronbach 
recommends methods of analysis that de- 
compose the individual scores into compo- 
nents for different levels of aggregation and 
eoo, in separate estimates for different 
evels. 

This article presents a secondary analysis 
of test scores decomposed into district, class, 
and individual components. The data were 
originally collected for an investigation of the 
development of individual differences in 
ability test scores from Grade 4 through 
Grade 9 (Harnqvist, 1960). A reanalysis 
(Harnqvist, 1973) studied the stability of 
profiles on test and factor level within and 
between occasions. Both studies used the 
individual scores as if there were no effects 
of grouping in classes or school districts. 


Data and Setting 


The samples to be used in this analysis 
comprise all classes in six grades in five 
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academic and general programs, em- 
t administration were used to explain 


factors into one Power and one Speed 


were likely to be different from those 


comprehensive school districts in Sweden, 
chosen because of their coverage of practi- 
cally the whole school population in their 
areas. This was expected to keep demo- 
graphic variables out of the differences be- 
tween grades. On the average, there were 
480 pupils in each grade, roughly equally 
divided between boys and girls and distrib- 
uted between 20 to 29 classrooms. About 
9096 of the pupils were of normal age for their 
grade, that is, 11 years old in Grade 4 
through 16 years old in Grade 9. : 

For these pupils, scores are available in 12 
tests. Ten of these were chosen as markers 
of five Primary Mental Abilities test factors 
(Verbal, Reasoning, Spatial, Numerical, and 
Perceptual Speed; Thurstone & Thurstone, 
1941). The remaining two tests seem to 
some extent to rely on Manual Speed and 
Dexterity. 

Within school districts, the pupils had 
been assigned to classes in different manners 
for Grades 4-6 and 7-9. In the elementary 
grades, the assignment was normally done 
before Grade 1 according to a neighborhood 
principle; and demographic differences be- | 
tween neighborhoods, for instance, rural 
versus more densely populated areas of the 
communities, may have had an influence. 
In Grades 7 and 8, the pupils could choose à 
second foreign language and more academic 
courses in mathematics and some other 
subjects, or they could just take a genera 
curriculum. This differentiation was fut 
ther developed in Grade 9, where the choice 
was between an academic, a general, and? ^ 
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vocational line with different branches. In 
Grades 7 through 9, pupils with similar 
choices were brought together in classes in 
one central school unit for the whole district. 
Classes in Grades 4-6 were taught by class- 
room teachers. Classes in Grades 7-9 were 
taught by subject-matter teachers. 


Method of Analysis 


Two types of analyses will be presented. In the first 
step, the overall variance in the scores of each test will 
be divided into the following three components: (a) 
variance between districts (D), (b) variance between 
classes within districts (CwD), and (c) variance between 
pupils within classes (PwC), These variances will be 
expressed as percentages of the overall variance of the 
test. Computationally, this is done by means of 
transforming each pupil score into three components: 
district mean, the deviation of class mean from district 
mean, and the deviation of raw score from class mean. 
Variances are then computed for the three parts sepa- 
rately, with the number of pupils in the grade as N. 
This means that the estimates at all three levels are 
weighted in relation to the number of pupils and can be 
added to get the overall variance (see Cronbach, Note 
1, chap. 3). 

The proportions of overall variance ascribed to the 
district and class levels correspond to correlation ratios 
calculated from the sums of squares in the analysis of 
variance (Winer, 1971, p. 118): 

_ SS within 

SSwa 

This is one among the possible measures of the vari- 
ability between groups. One alternative to this is cor- 
relation ratios calculated from mean squares (Winer, 
1971, p. 118), which can be written as 


e?z21 


; .N 71, SSwithin 
i N-k SSuu 


This gives a lower and less biased estimate of the vari- 
ability between groups. The ratio between degrees. of 
freedom grows when the number of subgroups (k) in- 
creases and when the number of individuals per group 
(n) decreases. In the present data, the ratio is 1.01 to 
"1,02 for districts and varies between 1.04 and 1.08 for 
classes. 1 
A second alternative measure is the intraclass cor- 
relation coefficient, defined by 
= alae 
P sita 
(Winer, 1971, p. 244), where «;? and a are variance 
Components for treatments and error, respectively. 
ne estimate of the intraclass correlation is 
1 MShetween — MS within 
P 7 MSyeween + (n — UMS within 
(Winer, 1971, p. 258). The observed variation between 
Broups is subtracted from the variation that can be ed 
Pected from random assignment to groups of size n from 
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a population with variance equal to the pooled within- 
groups variance. In the present case, it does not matter 
whether the collectives to which the random assignment 
is made are regarded as random or fixed in the vari- 
ance-component model. 

No simple algebraic relation exists between the cor- 
relation ratios and the intraclass correlation, but a 
number of examples calculated from the present data 
indicate that ô comes very close to 7”. 

In spite of the statistically attractive properties of ñ? 
and j, proportions of observed variance, corresponding 
toe? and 1 — e?, have been used in the present study for 
two reasons. 

1. The decomposition in collective and individual 
levels will be done not only for the variance of each test 
as such but also for the covariance between tests, that 
is, not only for the diagonal elements of the variance- 
covariance matrix but also for the off-diagonal elements, 
For those, the decomposition cannot be based on other 
than observed covariation. In order to be consistent 
with that, the diagonal elements also should be based 
on observed variation without reduction for expected 
effects of random assignment. 

2. Following Cronbach (Note 1, p. 4.2-4.4), one can 
argue that classroom collectives should be regarded as 
having a fixed membership and a common history, the 
effects of which are shown in the variability between 
collectives. Such collectives cannot be replicated in a 
random assignment sense. It is the full variation be- 
tween collectives that should be analyzed and not only 
what remains after correcting for possible effects of 
random assignment. Such an argument seems to be 
more valid for what Cronbach (Note 1, p. 1.17b-1.17c) 
calls “group-caused effects" than for demographic ef- 
fects, where groups differ initially. In the latter case, 
it may be meaningful to compare their variation also 
with a random assignment model. The distinction 
between these types of effects, however, is a matter of 
interpretation, and this must be based also on other 
information than the size of variation, The choice of 
e? means that results reported here are descriptive for 
the set of districts, classes, and pupils analyzed in this 
study. Generalization to other contexts cannot be 
based on formal statistical procedures. 

The analysis of the off-diagonal elements together 
with the decomposed variances will be done by means 
of factor analysis at the class and individual levels. 
Because of the small number of districts, and the rela- 
tively small variation found between them in the first 
step, the district level will be omitted here and district 
variation merged with class variation, so that each score 

d into only two parts: class means 


now is decompose D 
and deviations of pupil scores from class means. 


Cronbach (Note 1, p. 9,13) recommends not to use cor- 
relations as a basis for such two-level factor analysis but 
to use covariances brought to a common scale by means 
of the overall variances of the tests. This procedure 
may deserve a more detailed description. 

Covariance matrices are calculated at between-classes 
and within-classes levels. Since the differences in 
variance between tests to a large extent depend on the 
number of items, time limits, and similar arbitrary 
conditions, it is desirable to eliminate such variation. 
This is done by means of dividing each covariance with 
the product of the standard deviations of the overall 
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distributions for the pair of tests involved. The re- 
sulting “scale-free” covariances for between and within 
levels add up to the correlation in the pooled data. This 
procedure thus divides the overall correlation matrix 
into two additive parts: one for classes and one for 
pupils within classes. 

In the factor analyses, these two parts of the overall 
correlations are treated separately, with the proportion 
of variance of each test at between- and within-classes 
levels used as communalities. The between-classes 
proportions correspond to correlation ratios (e2); the 
within-classes proportions are 1 — e2. The use of these 
proportions instead of communalities estimated from 
the matrix also serves the function of bringing the factor 
variances onto the common scale of the overall vari- 
ance. 

All analyses have been performed both for intact 
classes and for boys and girls separately. With one 
exception, only the results for intact classes are reported 
in this article, 


Results 
Variance Proportions 


The relative proportions of the total 
variance accounted for at district and class 
levels are shown in Table 1. The propor- 
tions are averaged for tests belonging to the 
same hypothetical factor and for Grades 4-6 
and 7-9, respectively, 

At the class level, Grades 7-9 have higher 
proportions than Grades 4-6, which was 
expected because of the different principles 
of class composition at the two stages. The 
differences are largest for the tests that have 
been categorized as measuring verbal, rea- 
soning, and numerical abilities. In the 
Spatial tests, however, the main difference 
occurs between Grades 8 and 9 (14% vs. 
26%). 

Thus, the class contributions to variance 

in tests of a general intellectual and scho- 


Table 1 
Percentage o, 
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lastic content change most at the point | 


where self-selection between academic and \ 
nonacademic programs takes place. When 
vocational programs—some of them clearly 
practical-mechanical in their orientation— 
are introduced in Grade 9, class variability 
increases also in Spatial tests. 

Even district contributions to variance are 
higher in Grades 7-9 than in Grades 4-6, 
Here, however, the elective programs in the 
upper grades cannot be used for explaining 
the difference, since self-selection to pro- 
grams takes place within districts, and each 
district draws pupils from the same demo- 
graphic background in all six grades. 

District proportions of 596 or more were 
observed in five of the six tests, where timing 
was expected to be important for test scores, 
and where consequently, two parallel forms 
were used in order to get reliability esti- 
mates. Testing times were between 3 and 5 
minutes perform. With such short testing 
times, variations in test-taking sets and ir- 


regularities in timing and disturbances 


during test administration can have con- 
siderable influence upon results. 

TThe district means of the parallel forms of 
the Speed tests have been inspected, and 
deviations have been noted from the "nor- 
mal" development from grade to grade and 
from the first to the second parallel form. In 
both comparisons, considerable gains have 
been found in most instances, but there are 


a number of striking exceptions. ^ 


In the Multiplication and Addition tests, 
and similarly for both parallel forms, the 
means cease to grow or even decrease be- 
tween Grades 6 and 7 in all districts except 
one, which continues to show gains of normal 
size. This difference is likely to be a treat- 


f Contributions to Total Variance From District and Class Levels — 
Primary Mental Abilities 


Districts ithin districts 

subtest Grades 4-6 Grades 7-9 e TA es 1-9 
Synonyms and Opposites 2 4 10 26 
Letter Grouping and Figure Series 2 4 10 26 
Metal Folding and Block Counting 1 2 8 18 
Addition and Multiplication 2 6 12 23 
Identical Numbers and Highest Number 2 9 16 18 
Pattern Copying and Mark Making 4 4 16 
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ment effect and may be due to differences 
between districts in emphasizing continued 
training of arithmetic skills. It explains the 
increase of district variance between the el- 
ementary grades and Grade 7—a difference 
that remains in Grade 8 and to some extent 
in Grade 9 too. 

In the Highest Number and Mark Making 
tests, several district means show irregular 
trends from the first to the second parallel 
form. These differences often can be traced 
back to extreme deviations from the usual 
| pattern in individual classes, which influence 
- also the district means and contribute to 
district variance. Such differences seem to 
be caused by irregularities in test adminis- 
tration. To some extent, this explanation 
holds also for Pattern Copying, but there, 
one can also find a few instances of system- 

atic deviations of districts from the normal 
age trend that look like treatment effects. 

These examples may suffice to show that 

the variations in district contributions can 
be traced back to specific sources of influ- 
ence, and also that the variance proportions 
are sensitive to such influences. A more 
complete interpretation, however, requires 
information about the teaching and testing 
— EN that are not available from these 
ata. 


Factor Analysis 


Factor analyses have been performed on 
the scale-free covariances at within- and 
between-classes levels. Only the 10 tests 
that could be categorized as markers of five 
primary mental abilities were included in the 
analyses. At the between-classes level, two 
principal components were extracted; at the 
Within-classes level, five were extracted, 
corresponding to the number of hypothe- 
‘sized primary mental abilities. At both 
levels, the components were rotated ac. 
cording to the direct quartimin method 
(Dixon, 1975, p. 368), which gives oblique 
factors. From the five correlated factors in 
‘the within-classes analyses, second-order 
E were extracted and Seg. also using 
‘the direct quartimin method. 

Table 2 chow the distribution of the total 
"Variance of the battery between the unro- 
lated factors. 
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Table 2 
Percentage of Distribution of Total Variance 
Between Principal Components 


Grade 

Principal 

component. 4 5 6 7 8 9 

Within classes 

1 32 37 33 uU 96 22 
2 18 18 19 19 17 15 
3 9 9 8 9 7 8 
4 R68 EI AS eh o or 
5 B scc EON 


Remainder 16 14 14 14 12 M 
Subtotal 88 89 84 76 72 69 


Between classes 


1 6 5 9 18 20 24 
2 3 2 3 3 3 4 
Remainder agi: .4. 9. 6. 3 
Subtotal 12 d 16 ? 238 931 
Total 100 100 100 100 100 100 


The subtotals between classes correspond 
to the average correlation ratio (e?) ex- 
pressed as percentages. "These are higher in 
Grades 7-9 than in Grades 4-6, which is a 
finding that already has been interpreted in 
the previous section. The subtotals within 
classes show the opposite pattern. Almost 
all of these differences are accounted for by 
the first principal components within and 
between classes. The remaining parts of the 
variance are rather equally distributed be- 
tween levels and components in all six 
grades. The components taken out for 
rotation explain between 81% and 83% of the 
battery variance. 

In ihe within-classes analyses, the same 
five factors appear in all grades except Grade 
4. The results are shown in Table3. The 
factors have been reported in the same order 
of content in all grades, but with the corre- 
sponding factor numbers also given. 

All analyses show a neat, simple structure; 
in Grades 5-9, the structure corresponds 
almost exactly to that hypothesized in the 
composition of the test battery. Among the 
loadings above .24, only four—all of them in 
the order of .30—appear in places where they 
are not expected. Grade 4, however, de- 
viates from the usual Primary Mental Abil- 
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PRIMARY MENTAL ABILITIES 711 
| Table 4 
Second-Order Factor Loadings for the Within-Classes Analysis 
First-order Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Grade 9 
factor 1 2 n 2 1 2 1 2 1 2 1 2 

Vv 37 26 64 64 51 58 40 
R ^ 46 68 72 76 73 72 
S 32 67 66 56 65 60 
R/S 84 
N 71 17 66 74 86 
P 72 68 71 67 60 
N/P 75 


Note. 
V = Verbal, R = Reasoning, S = 
factors appearing only in Grade 4. 


ities test pattern in two respects. The 
Reasoning and Spatial tests go together in a 
different way, and Numerical and Percep- 
tual Speed tests are not distinguished from 
each other. The interpretation of the de- 
viations will be postponed until the Discus- 
sion section. For the present, it will be 
concluded that the majority of instances at 
the within-classes level strongly supports the 
usual categorization of the tests in five pri- 
mary mental abilities: Verbal, Reasoning, 
Spatial, Numerical, and Perceptual Speed. 
This result corresponds to what Henrysson 
(1965) found in an analysis of the original 
pooled matrices. 

Next, the results of the second-order 
analyses will be reported (see Table 4). In 
Grades 5-9, where the usual Primary Mental 


Loadings below .25 are replaced by dots. Decimal points are omitted. Numbers 1 and 2 are factor numbers. 
Spatial, N = Numerical, and P = Perceptual Speed factors. R/S and N/P are mixed 


Abilities test pattern was found, the first- 
order factors of Reasoning, Spatial, and 
Verbal abilities (in the order mentioned) 
form one second-order factor; the Numerical 
and Perceptual Speed form (in varying 
order) another. The correlations between 
these two factors vary between .14 and .32. 

"The between-classes analyses are reported 
in Table 5. In Grades 4-8, two distinct fac- 
tors appear. One is loaded in tests belonging 
to the Verbal, Reasoning, and Spatial factors 
at within-classes levels and one in Numerical 
and Perceptual Speed tests. This distinction 
corresponds to the main pattern found in the 
second-order analysis of within-classes fac- 
tors. It also corresponds to the old distinc- 
tion between Power and Speed tests. Grade 
9 shows another pattern, with the first factor 


Table 5 1 
Factor Loadings in Two Oblique Factors for the Between-Classes Analysis 
Primary Mental Grade 4 Grade 5 Grade 6 Grade 7 Grade s Grade 
Abilities subtest — 1 2 1 2 f 2 PTE 1 
Synonyms 33 26 Q3) 52 50 4 
Opposites 29 (24) (022 53 " ie 
Letter Grouping (18) (17) 31 39 d y 
Figure Series 34 (24) 3 s + Jedi 
Metal Folding 27 (23) 33 39 : ^i . 
Block Counting (22) "wd. . G0, 3... god 
Addition j 91:7 Cog PP UAE SOLUS " ee 
Multiplication 3 30 32 4 El 
Identical Numbers — - ua: es pee Dro 249249 E ed 
Highest Number ORI Na eio cur oue vi - $ — lÉ 
i i it i 2A is sl in ni 
Note. Loadings below .25 are replaced by dots. Tho tipan pa for tests without a loading above par 


Decimal points are omitted. Numbers 1 and 2 are 


712 


comprising all tests and the second con- 
trasting Spatial against Numerical and 
Perceptual Speed tests, mostly, however, 
with very small loadings. The correlations 
between the two factors vary between .27 
and .50, except in Grade 9, where the or- 
thogonal pattern was left unchanged. An 
inspection of the loadings, however, indicates 
that an alternative solution in better agree- 
ment with the other analyses could be 
found. 

Thus, in general, second-order within 
factors and first-order between factors show 
the same pattern, with Verbal, Reasoning, 
and Spatial abilities belonging to one factor 
and Numerical and Perceptual Speed facil- 
ities to another. 


Discussion and Conclusions 


The present study was initiated to serve 
as an empirical illustration of some of the 
methodological issues brought up by Cron- 
bach (Note 1) in his paper on the choice of 
level for design and analysis in classroom 
research. The available data seem to have 
been well suited to exemplify some of the 
analytical procedures in treating multilevel 
data. The data also illustrate the size of 
correlation ratios in an ordinary school sit- 
uation under different grouping principles. 
The content of the tests used in the study 
(Primary Mental Abilities), on the other 
hand, may not be well suited to register 
contextual effects of grouping or treatments 
varying between schools and classrooms. 
What is shown is mainly the sensitivity of 
different measures to preexisting differences 
related to the grouping of pupils. Even so, 
the study became not only a methodological 
illustration but also resulted in some findings 
of substantive interest. 

To begin with the decomposition of vari- 
ance into parts for different levels, the 
analyses have made it possible to detect and 
interpret the following sources of varia- 
tion: 

1. Demographic differences between the 
areas from which the pupils are recruited are 
shown in district contributions to variance 
in all grades and class contributions in the 
elementary grades. In general, these vari- 
ations are very small, which may be due to 
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two characteristics of the present data. The 
variations come mainly from rural and small 
town districts and thus do not include, for 
instance, variations between big cities and 
suburban areas. Moreover, the variations 
come from a country where, according to 
international comparisons (cf. Thorndike, 
1973, p. 142), variations between schools in 
reading levels are small. 

2. Self-selection effects are shown in the 
class contributions in Grades 7-9. The 
choice between academic and nonacademic 
programs in Grades 7-8 gives rise to sub- 
stantial class contributions in Verbal, Rea- 
soning, and Numerical scores, that is, in 
abilities that dominate in measures of IQ and 
scholastic performance. The further sub- 
division in vocational programs in Grade 9 
also registers in Spatial tests. There is no 
reason to believe that the size of the self- 
selection effect found here is particularly 
small due to local or national characteristics 
as above; rather, it depends on the tracking 
system and the principles of class composi- 
tion. For instance, the present Swedish 
system, with heterogeneous classes also in 
the upper grades of the compulsory school, 
should not originate such large effects. 

3. Treatment effects are less likely to be 
found in ability than in achievement scores, 
but the Numerical tests evidently function 
in both ways. Differences in emphasizing 
arithmetic skills are shown in district con- 
tributions in the Addition and Multiplica- 
tion tests. 

4. Irregularities in test administration (cf. 
Gustafsson, 1978) were used to explain some 
of the larger district and class contributions 
in Speed tests. : 

In this list of possible effects, one type is 

missing so far: contextual effects caused by 
the interaction among pupils in the class. 
Such effects can be measured as differences 
between classes over and above what can be 
expected from the characteristics of the in- 
dividual pupil. The present analysis does 
not permit detection of such effects, nor does 
the content of the tests make it plausible to 
find any. 
_ From a substantive point of view, the most 
interesting findings come from the factor 
analyses. Figure 1 presents an idealized 
picture of the results. 
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Stratum Individual Class 
level level 
I g g 
VN 
ON 
" Power Speed Power | Speed 
| \ A 
m VRS NP | 
eet oM J 
EE 
D" 0000000000 0000000000 
Figure 1. Factor structure at individual and class 
levels. (g = General Ability, V = Verbal, R = Rea- 


‘soning, S = Spatial, N = Numerical, and P = Perceptual 
Speed factors.) 


The main pattern is a hierarchical struc- 
ture with four strata at the individual level 
and three strata at the collective level. The 

"two upper strata are similar at both levels of 
analysis; the third is missing at the class 
evel, where the Power and Speed factors are 
based directly on the tests without the in- 
tervening level of primary factors. In some 
preliminary analyses of class means, how- 
ever, where more than two factors were ex- 
tracted in spite of their small contributions 
5 total variance, some of the “primary” 
| factors could occasionally be traced. This 
“might indicate that the similarity could ex- 
tend also to the third stratum, though the 
similarity is hardly detectable within the 
range of variation studied here. Further- 
more, all unrotated first-order analyses at 
the individual level have in their second 
component a contrast between the six Power 
and the four Speed tests. The same pattern 
holds for all except one of the unrotated 
"analyses at the collective level. Thus, the 
‘similarity between the two levels of analysis 
is even more striking than the rotated factors 
Teported earlier indicate. 

This similarity between the individual and 
collective levels suggests a sort of factorial 
isomorphism, that is, a hierarchical pattern 
Tepeating itself at varying levels of aggrega- 


minally identical 
te different things at 
different levels (cf. Cronbach, Note 1, pp. 9.8, 
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9.16-9.17). So, there is reason to be suspi- 
cious about such ideas of factorial isomor- 
phism between levels of analysis. 

: In the present study, two different prin- 
ciples of class composition are represented: 
the neighborhood principle in Grades 4-6 
and the grouping-by-electives principle in 
Grades 7-9. Both result in similar factor 
patterns at the collective level, but there is 
a great difference between the proportions 
of variance explained by different compo- 
nents (see Table 2). 

The change in aggregation principles af- 
fects the first components strongly and 
leaves the second and remaining components 
fairly unchanged. The growth of the first 
between-classes component is substan- 
tial—from 7% to 21%. Even more inter- 
esting is that the second components, the 
contrasts between Power and Speed tests, 
are not affected at all. The conclusion so far 
is that the Power versus Speed contrast has 
nothing to do with the change of grouping 
principles. 

Further insight may be gained from the 
correlations between the separate Speed 
tests at class and individual levels (see Table 
6). Since the reliabilities differ considerably 
between these tests, the correlations have 
been corrected for attenuation. 

The four tests are presented in Table 6 in 
a hypothetical order of complexity from 
simple perceptual scanning for Identical 
Numbers, over scanning in combination with 
size comparison, to Addition and finally 
Multiplication. The within-classes corre- 
lations then form a simplex pattern (Gutt- 
man, 1954). The highest correlations are 
next to the diagonal, and from there on, 
correlations decrease in both rows and col- 
umns. This pattern holds both in the ele- 
mentary and upper grades, and not only in 
the averages reported in Table 6 but in all 
matrices as well. This ordering of percep- 
tual and arithmetical operations with num- 
bers along a common scale of complexity 
corresponds to what Guttman (1966, pp. 

9) found when applying order analysis 
to Coombs’s (1941) data on number ability. 
The regularity is even more striking in the 


present study. 
Between classes, on the other hand, no 
d. The correla- 


such ordering can be foun 


P 
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Average Correlations Between Numerical and Perceptual Tests (Corrected for Attenuation) 


Primary Mental Abilities 


Grades 7-9 


Grades 4-6 


subtest 1 2 3 1 2 3 
Sk 


Within classes 
1. Identical Numbers 


2. Highest Number 73 

3. Addition 59 

4. Multiplication 52 
Between classes 

2. Highest Number 72 


3. Addition 


4. Multiplication 3 63 719 4 E) 


tions are fairly equal in size but with a sur- 
plus amount of commonality for the Addi- 
tion and Multiplication tests. Class level 
correlations are higher in the upper than in 
the elementary grades. The difference is 
reversed at the individual level, which has to 
do with the size of between-classes variation 
at the two stages. These differences are likely 
to have been absorbed by the first compo- 
nents. 

To conclude this part of the discussion, it 
seems evident that the change of grouping 
principles not only leaves the variance of the 

ower versus Speed contrast unaffected, but 
that also the processes operative in the 


In contrast, different processes 
seem to be effective at individual and class 
levels, and only the individual correlations 
can be explained by means of the order of 
complexity of the tasks. 

From here on, the 


tempt needs support 
classroom processes that. are not available 
from these data. 

In the elementary grades, the distribution 
of variance is an expression of “natural” 
variation between unselected classes. 
Classes vary in general level due to differi 
demographic backgrounds of the pupils (the 
Power factor) but also in their stress upon 
speed and accuracy of performance. When 
the grouping principle changes in Grade T, 
classes still show some natural variation in 
zeneral level (the part associated with dis- 
rict variation; see Table 1); but in addition, 


69 
71 53 66 
62 87 50 55 B4 


85 


88 87 


So far, the discussion has been based on 
the idealized picture in Figure 1. Some de- 
viations from this picture should also be 

i Grade 4, the otherwise regular 
Primary Mental Abilities test pattern at the 
individual level is broken, and the tests sort 
themselves into the following groups: (a) 
Addition, Multiplication, Identical Num- 

ts, and Highest Number; (b) Synonyms 
and Opposites; (c) Figure Series and Metal 
Folding; (d) Block Counting and Metal 
Folding; and (e) Letter Grouping and Iden- 
tical Numbers, 

The merging of the Numerical and Per- 


Ting ceptual Speed factors in the youngest groups 


can easily be understood from the pattern of 
correlations between the tests at the indi- 
vidual level. The average difference of 
correlations within versus between the two 
categories of tests is a little larger in the 
matrices that give two factors than in the one 
that merges them; but the difference is not 
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Average Within-Classes Correlations Between Reasoning and Spatial Tests 


Primary Mental 
Abilities subtest 1 I a 3 1 EY B 3 
Boys 
1, Letter Grouping 
2. Figure Series 44 46 
3. Metal Folding 38 51 21 42 
4. Block Counting .33 AL .53 38 38 48 
Girls 
2. Figure Series 49 50 
3. Metal Folding 42 EI 38 ES 
4. Block Counting Al 43 51 40 39 57 


eat, and results in both cases are consonant 
with the order-of-complexity hypothesis. 
0 begin with, I was surprised that the two 
factors merged in the younger and not in the 
older groups, where an automatization of 
arithmetical operations was more likely to 
have taken place, making the two types of 
sks less distinct from each other. But itis 
also possible that identification and com- 
Parison of numbers still are fairly complex 
lasks in the lower grades and therefore more 
Similar to arithmetic, while automatization 
comes later and perhaps only in select groups 
Of students. 
The other irregularities were found in the 
Reasoning and Spatial tests. The Figure 
ries and Metal Folding tests go together. 
oth are based on figural materials, and 
from earlier studies (cf., e.g., French, 1965), 
it is known that some Spatial tests can be 
Solved by reasoning processes. An inspec- 

on of the correlations at the individual level 
ndicates that Reasoning and Spatial tests 
lso may form simplex patterns. Table 7 
hows average correlations in elementary 
nd upper grades. Boys and girls have been 
kept separate because average sex differ- 
fences go in different directions in the Letter 
Grouping and the Spatial tests. 

The simplex pattern holds for Grades 4-6. 
n Grades 7-9, the correlation between 


{simplex pattern, it is hardly meaningful to 
alk about an increasing order of complexity 
the tasks. Rather, the ordering seems to 
(press varying mixes of reasoning and vi- 
ization processes that go into the solu- 


tion of the problems, with identifying de- 
viations from the principles of Letter 
Grouping at the reasoning end and counting 
of partly concealed blocks in two-dimen- 
sional representations at the visualization 
end of the continuum. In some instances, 
the intermediary tasks, Figure Series and 
Metal Folding, are close to each other and 
form a factor, leaving the extreme tests iso- 
lated at the ends. If only common variance 
had been analyzed with a proper number of 
factors instead of the total within-classes 
variance actually analyzed, the extreme tests 
would possibly not have appeared as fac- 
tors. 

The analyses upon which these interpre- 
tations have been based deviate from tradi- 
tional unrestricted factor analysis in a 
number of respects: Parallel analyses were 
performed at individual and class levels. 
The analyses were based on scale-free co- 
variances instead of correlations. The total 
variances at the two levels were analyzed 
instead of common variance. The number 
of factors was restricted in advance accord- 
ing to the hypotheses. 

Still another characteristic of the present 
design, that is, the parallel matrices from six 
grades, could have been used for an addi- 
tional restriction. A common hypothetical 
structure could have been tested against the 
data by means of Jóreskog's (1971) method 
for simultaneous factor analysis in several 
populations. The finding of similar factor 
structures from several independent analy- 
ses using standard computational programs 
indicates that this method might be fruit- 
fully applied to the present data. 
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All analyses in the present study were 
done both for intact classes and for boys and 
girls separately. The separate analyses were 
done primarily as a safeguard against mean 
score sex differences’ influencing the struc- 
ture of the covariance matrices, that is, in- 
creasing covariances between tests that have 
sex differences in the same direction and 
decreasing them in the opposite case. In 
general, the results were rather similar in all 
sets of analyses. 

This attempt to protect the analyses from 

effects of pooling data from boys and girls, 
however, brings up a central question in the 
discussion of multilevel analysis. Provided 
that the criteria of separation are not evident 
from theory and design, which degree of 
decomposition should be used in data anal- 
ysis? Conventionally, age (or grade) and sex 
variations are kept out of most analyses of 
educational data by means of grouping or are 
at least checked before aggregation. The 
reason for this is that these variables nor- 
mally represent strong sources of preexisting 
variation that can mask differences to be 
focused on in the particular study or can in- 
teract with them. School and class differ- 
ences, on the other hand, are not part of this 
convention of separating groups in the 
analysis. Even though sampling often is 
based on such units and not on the individ- 
ual pupils, data are usually pooled before 
analysis. From the point of view of signifi- 
cance testing, this practice is always doubt- 
ful. When treatments are given in groups 
and contextual effects are likely to appear, 
Cronbach (Note 1) shows that it is mislead- 
ing also from a theoretical point of view. 

In a description of “natural” variation, on 
the other hand, pooling might be permissi- 
ble, since demographic variation between 
units is a legitimate part of the total varia- 
tion to be described and analyzed. The aims 
of the study for which the present data were 
collected were primarily of this descriptive 
kind. Even in this case, however, the de- 
composition of variances and covariances has 
provided more distinctive information than 
the corresponding parts of the original 
pooled analysis. It seems to me that such 

decomposition is to be preferred even in a 
description of natural variation—maybe 
followed by putting together the different 
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sources of variation again to the full varia- 
tion. However, since for economic reasons 
most collections of data permit only a limited 
number of grouping variables, the problems 
remain: (a) In what order of priority should 
different sources of variation be considered? 
(b) When is it time to stop decomposition? 
(c) What level of analysis gives the most 
relevant information? These questions 
have to be answered in each study and 
should be answered preferably on the basis 
of theory rather than convention. 


Reference Note 


1, Cronbach, L. J. Research on classrooms and schools: 
Formulation of questions, design, and analysis. 
Stanford, Calif.: Stanford University, Evaluation 
Consortium, July 1976. 


References 


Coombs, C. H. A factorial study of number ability. 
Psychometrika, 1941, 6, 161-189. 

Dixon, W. J. (Ed.). Biomedical computer programs. 
Berkeley: University of California Press, 1975. i 
French, J. W. The relationship of problem-solving 
styles to the factor composition of tasks. Educa- 
tional and Psychological Measurement, 1965, 25, 

9-28. 

Gustafsson, J. E. A note on class effects in Aptitude X 
Treatment interactions. Journal of Educational 
Psychology, 1978, 70, 142-146 

Guttman, L. A new approach to factor analysis: The 
Radex. In P. F. Lazarsfeld (Ed.), Mathematical 
thinking in the social sciences. Glencoe, Ill: Free 
Press, 1954. 

Guttman, L. Order analysis of correlation matrices. In 
R. B. Cattell (Ed.), Handbook of multivariate ex- 
Dedi is psychology. Chicago: Rand McNally, 

Hárnqvist, K. Individuella differenser och skoldif- 
ferentiering (No. 13). Stockholm, Sweden: Statens 
Offentliga Utredningar, 1960. 

Harnqvist, K. Canonical analyses of mental test pro- 
files. Scandinavian Journal of Psychology, 1973, 1 4, 
282-290. 

Henrysson, S. Faktoranalys av Hárnqvists anlagsprov. 
Nordisk Psykologi, 1965, 17, 11-20. 

Jöreskog, K.G. Simultaneous factor analysis in several 
populations. Psychometrika, 1971, 36, 409-426. 
Thorndike, R. L. Reading comprehension education 
in fifteen countries. New York: Wiley, 1973. — 
Thurstone, L. L., & Thurstone, T.G. Factorial studies 
i tellinepee. Psychometric Monographs, 1941, 

o. 2. 


Winer, B.J. Statistical principles in experimental 
design (2nd ed.). New York: McGraw-Hill, 1971. 


Received September 6,1977 9 | 


Journal of Educational Psych: 
1978, Vol. 70, No. 5, "11-129 aed 


Reading Skill and the Role of Verbal 
Experience in Decoding 


Thomas W. Hogaboam and Charles A. Perfetti. 
University of Pittsburgh 


Previous studies have shown that decoding speeds are generally faster for 


skilled readers than for less-skilled readers. 


In Experiment 1 of the present 


study, this reader difference was found to be greater for pseudowords and two- 
syllable units than for English words and one-syllable units. Experiments 2 
and 3 provided skilled and less-skilled readers with various types of experience 
with pseudowords prior to two decoding tests, vocalization latency and same- 
different decisions. Aural and printed experience with pseudowords provided 
significant increases in decoding speeds for both reader groups, but providing 
meanings for pseudowords as a part of the experiences added nothing. These 
effects were still present after 10 weeks. These studies suggest that decoding 
differences are not wholly attributable to prior experience with word units and 
that processes involving phonetic components may be involved. 


The relationship between single-word 
decoding and reading comprehension is of 
critical importance for a theory of skilled 
reading and for reading instruction. A 
general framework for a theoretical rela- 
tionship is the assumption of a limited-ca- 
pacity central processor that shares re- 
sources among the various component 
| reading processes. This processing as- 
sumption is quite general for cognitive psy- 
chology and has been given a particular 
treatment by LaBerge and Samuels (1974) 
that emphasizes the control of attention in 
reading. Attention to semantic properties 
| of input is presumably characteristic of 
skilled reading, and attention to phonetic or 
visual features is more characteristic of de- 
manding reading material and perhaps of 
less-skilled readers. 

Given this theoretical framework, a cor- 
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relation between single-word decoding 
ability and reading comprehension should 
exist. Perfetti and Hogaboam (1975) found 
that single-word vocalization latencies (VLs) 
were longer for less-skilled readers, as de- 
fined by a comprehension measure, than for 
skilled readers. Accuracy of pronunciation 
is also related to reading comprehension 
(Shankweiler & Liberman, 1972). These 
studies, however, are correlational in nature. 
Skilled readers presumably have had more 
reading experience and thus more encoun- 
ters with the words that are used to measure 
decoding ability. Consistent with this in- 
terpretation is the fact that the differences 
in VL between skilled and less-skilled 
readers are greater for less frequently oc- 
curring English words than for more fre- 
quent words (Perfetti & Hogaboam, 197 5). 
However, it was also found that the differ- 
ence between skilled and less-skilled readers 
was greater for pseudowords than for real 
words. Presumably, pseudowords have 
never been seen before by either skilled or 
less-skilled readers. This suggests that 
skilled readers’ faster decoding speeds are 
due to superior word-decoding processes 
rather than to simple frequency of exposure 
to particular words. 

The source and nature of this superiority 
is open to question. Many investigators 
have hypothesized that decoding involves 


ological Association, Inc. 0022-0663/78/1005-0717800.75 
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the use of rules that operate over subword 
units, such as spelling patterns (Gibson, 
Pick, Osser, & Hammond, 1962), vocalic 
center groups (Spoehr & Smith, 1973), or 
letter positions (Mason, 1975). Skilled 
readers may possess a better implicit 
knowledge of subword regularities than 
less-skilled readers. 

Experiment 1 evaluated this hypothesis 
by varying the number of syllables of both 
real words and pseudowords. If subword 
processes are a source of decoding differ- 
ences between skilled and less-skilled read- 


greater for two-syllable than for one-syllable 
— and greater for than for 
rea 3 i 


pothesis by observing the effects of “word- 
ness” and number of syllables on vocaliza- 
tion latencies of children differing in reading 
comprehension skill. 
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Method 


Subjects 


Subjects were 72 third- and fourth-grade students 
from an urban parochial school in a predominantly 
white, working-class neighborhood. For purposes of 
data analysis, the sample size was 36 third-grade and 30 
fourth-grade students following the elimination of 


subjects who did not complete all experimental ses- 
^ ' 


sions, 


Materials and Apparatus 


Forty-eight experimental items were selected which 
were either real words or pseudowords and, orthogo- 
nally, one or two syllables. All real words were high- 
frequency words—at least 138 per million (up to 1,465 
per million) based on the printed frequency count of 
Carroll, Davies, and Richman (1971), The median 
frequencies were 426 and 309 for one- and two-syllable 
words, respectively. One-syllable words had three 
letters, two-syllable words had five. ‘The 12 one-syllable 
nonwords were taken from the Underwood and Schultz 
(1960) norms for pronounceability of consonant- 
vowel-consonants (CVCs). The 12 selected CVCs were 
all highly pronounceable for nonword CVCs, ranging 
from ratings of 2.06 to 3.44, where a rating of 1 is easy, 
5 is average, and 9 is hard to pronounce (Underwood & 
Schultz, 1960). The 12 two-syllable nonwords were all 
CVCVCs with acceptable English spelling patterns and 
judged by the experimenters to be easy to pronounce 
thence, "pseudowords"), All experimental items plus 
6 additional warm-up items, half real words and half 
pseudowords, were typed in lowercase elite type on 
white cards and photographed for 2 x 2 inch (5.08 X 
5.08 cm) transparencies. 

‘The slides were projected onto a rearview screen using 
A Kodak Carousel projector equipped with a solenoid: 
operated shutter. The opening of the shutter started 
a ba carn id (Hunter Klockcounter) which M 

A voice-operated relay at the subject's ini 
vocalization, which also closed the shutter. ‘This ap 
paratus thus allowed the measurement of vocalization 
latency, defined as the elapsed time from the presen- 
tation of a word to the subject's initial vocalization. 


Procedure 
‘The first part of the study consisted of collecting 


Darrell Listening hashis 


i 


' 


they should not try to "sound out" the words, since 

words would disappear as soon as they started, 
Following six warm-up items, the 48 experimental 
i were presented in four blocks of 12 each, each 
k consisting of all the items of one type. Since 
Perfetti and Hogaboam (1975) presented items in a 


seen. The order of the blocks was counterbalanced 
across subjects. Vocalization latencies were recorded 
after each trial, with the 48 trials continuing without 
interruption, except for a brief pause between blocks. 
"This part of the study required than 20 minutes. 


Results 


intermediate form). 
The mean vocalization latencies for each 


om. 

For the third grade, the skilled readers 
faster overall than the less-skilled 

ers, F’ (1, 37) = 17.7, p < .001. There 

also a main effect of number of syllables, 

41, 57) = 27.99, p < 001, and item 

(1, 58) = 24.25, p < .001, with 
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Table 1 
Mean Vocalization Latencies in Experiment 1 
(in sec) 


No syllables _ 
Condition 1 ? 
Third grade 
Skilled 
Word 1m iu 
Preudoword 120 L^ 
Less skilled 
Word IRI] 200 
Preudoword 1M 308 
Fourth grade 
Skilled 
Word 9^ 10 
Preudoword wy 170 
Less skilled 
Word LM Ln 
Peeudoword 216 au 
items being faster than two-ayllable items 
and real words fi All 


di 
i SEI 
ne 
ii 


= 
g 


| 
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significant interaction between number of 
syllables and subjects within reading skill, 
F(34, 1496) = 7.92, p « .001, indicating that 
individual factors beyond those common to 
askill group interacted with the syllable ef- 
fect. 

The Reading Skill X Item Type interac- 
tion was also significant, F'(1, 42) = 6.97, p 
« 01. Examination of Table 1 shows that 
the nature of this interaction was the same 
as was found previously when the items were 
randomly arranged (Perfetti & Hogaboam, 
1975). While the difference between pseu- 
dowords and real words averaged only 318 
msec for skilled readers, less-skilled readers 
took an additional 1,061 msec for pseudo- 
words. In addition to reading-skill differ- 
ences in the real word-pseudoword effect, 
there was again substantial variance within 
each ability group on the item type effect, 
F(34, 1496) = 7.17, p < .01. 

Finally, as would be expected, both 
subjects, F(34, 1496) = 29.29, p < .01, and 
items, F(44, 1496) = 4.52, p « .01, contrib- 
uted significant variance. Thus, there were 
significant individual differences that were 
not accounted for by the grouping of subjects 
into two skill groups; furthermore, the sig- 
nificant higher order interactions of subjects 
within groups means that this variability was 
greater for subjects within the less-skilled 
group than for subjects within the skilled 
group. 

The above pattern of results was repli- 
cated by the analysis of the fourth-grade 
data. The only possible divergence worth 
noting is that the Reading Skill x Number 
of Syllables interaction was associated with 
a higher Type I error probability, F’(1, 33) = 
3.39, p = .075. 


Discussion 
The results of this experiment repli 
and extend the finding that children Ul 
by comprehension skill differ in single-word 
vocalization latency (Perfetti & Hogaboam 
1975). In particular, the real word-pseu- 
doword interaction with reading skill. is 
replicated. Coupled with the observed in- 
teraction between number of syllables and 
reading skill, this suggests that a basic coding 
process, not dependent on whole-word rec- 
ognition, distinguishes skilled from less- 
skilled readers. To quantify the interactions 
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for a skilled third-grade reader, the cost ofa 
second syllable was about 240 msec, but for | 
a less-skilled reader it was over one sec. The 
magnitude of this cost difference is some- 
what reduced for the fourth grade, about 330 
msec for skilled readers and 780 msec for 
less-skilled readers, but consistent with the 
third-grade data. Thus, less-skilled readers 
took longer to either decode or program the 
vocalization of a second syllable. This in- 
teraction rules out simple orienting and/or 
gross motor programming (but not sylla- 
ble-level motor programming) as a source of 
the reader difference. If skill differences 
were attributable to some kind of general 
reaction-time factor, there would be no syl- 
lable interactions, since this time has mad 
its contribution to both syllable length 
equally. 

The “wordness” effect was comparable 
the syllable effect in its interaction with 
reading level. Third-grade skilled readers 
required only an additional 300 msec to 
begin vocalizing a pseudoword, compared 
with a real word. The less-skilled third 
graders required an extra 900 msec. For 
fourth graders the figures were comparable, t 
about 400 msec for skilled and 1,200 msec for 
less-skilled readers. Together with the 
syllable results, this implicates a basic pro- 
cess difference. It cannot be a question of 
familiarity at the word level, since pseudo- 
words are unfamiliar to all subjects. : 

It is not possible to determine from this 
experiment what subword units are involve 
in the decoding process. The effects of 
number of syllables in this study could be 
due to number of syllables, per se, or to any. 
other subword unit measure that correlates. 
with number of syllables, such as number 0! 
letters. The conclusion, then, is not that any 
particular unit has been identified as the 
unit of decoding, but that whole-word eX 
perience in itself is not responsible for de- 
coding speed differences between skilled ani 
less-skilled readers. Both the Real Word- 
Pseudoword X Skill interaction and the 
Syllable Length X Skill interaction argue 
against this interpretation. Thus, subwore- 
processes are implicated in decoding differ 
ences between skilled and less-skilled read 
ers. 

Two reservations should be noted. First 
amount of prior experience with subw0 
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units cannot be ruled out as a source of 
reader differences. However, experience 
with subword units would presumably come 
through exposure to words. These words 
would need to be appropriately parsed into 
the type of subword unit that would later be 
useful in decoding whole words. When a 
word is decoded, this knowledge would have 
to be applied. This, however, is equivalent 
to saying that decoding skills are acquired 

and applied. Second, the locus of the syl- 

lable effects may lie in a motor programming 

stage instead of in a decoding stage. This 

interpretation cannot be ruled out in the 

present study, since the dependent measure, 

vocalization latency, always contains a motor 

programming component. This possibility 

will be further examined in Experiment ra 


Experiment 2 


Experiment 2 addresses three issues. 
First, while subword processes are impli- 
cated in Experiment 1, the correlational 
nature of the study leaves open an alterna- 
tive explanation for the Number of Syllables 
X Skill interaction. The difference between 
skilled and less-skilled readers’ previous 
experience with words may be greater for 
two-syllable words than for one-syllable 
words. To evaluate this explanation we 
provided skilled and less-skilled readers with 
a comparable amount of experience with 
pseudowords. Pseudowords were used to 
assure a true zero value for all subjects prior 
to the experiment. To the extent that the 
amount of prior whole-word experience is 
responsible for reader differences, providing 
equal amounts of experience should reduce 
these decoding differences. ; 

Second, the quality of prior experience is 
of interest for both theoretical and instruc- 
tional purposes. Unskilled readers may be 
failing to benefit from particular types of 
verbal experiences. For example, verb 
encounters can be aural or through print. 
Furthermore, a child may or may not be fa- 
miliar with the meaning oi any given E. 
By varying the quality of prior experien 
With a pseudoword, it is possible to deter- 
mine whether skilled readers are making use 
of certain types of information that are not 
affecting less-skilled readers. Our hypoth- 
esis is that skilled and less-skilled readers 
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differ quantitatively rather than qualita- 
tively in word decoding skills and that the 
type of experience that affects skilled readers 
will also affect less-skilled readers. 

The assumption of the word-experience 
variable is that some word experiences rel- 
evant for fast decoding can be simulated 
with artificial words. The intention was to 
provide systematic combinations of aural, 
visual, and semantic experiences with 
pseudowords and to observe the effects of 
these experiences on the decoding latencies 
of skilled and less-skilled readers. Each 
subject was given four types of word expe- 
rience over 3 days: aural only, aural plus 
meaning, print only, and print plus meaning. 
These conditions simulated four types of 
encoding for a real word that a child might 
have had prior to a measured decoding. 

Finally, the question of whether such 
variables affect a decoding stage or a motor 
programming stage was explored by having 
two dependent measures. One is vocaliza- 
tion latency, measured as in Experiment 1. 
The second is latency of letter-string 
matching, which measures the amount of 
time required by a subject to decide whether 
two letter strings are the same. If experi- 
ences with words primarily affect motor 
programs preparatory to articulation, then 
only vocalization should be affected. If their 
effect is also on earlier decoding stages, then 
matching times should also be affected. 


Method 


Subjects 


Ten pairs of subjects were matched on IQ and sex 
with the Iber of each pair being a skilled reader and 
the other a less-skilled reader; the subjects were from 
the fourth grade of the same school described in Ex- 
periment 1, Skilled readers were above the 60th per- 
centile on the Metropolitan Achievement Test, Reading 
Subtest, and less-skilled readers were below the 40th 
percentile. There were five levels of 1Q with two pairs 
of subjects at each level, a male pair and a female pair, 
with one member being a skilled reader and the other 


a less-skilled reader. 


Materials and Design 


The basic design of this experiment involved one 
between-subjects variable, reading skill, and one wi- 
thin-subjects variable, condition, which had seven lev- 
els. For each subject there were 3 words within each 
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Pre sepa Bf of experi aae 
type of experience that 

subject was given for the words within the condition. 
Four of the conditions involved some type of experience 


with prior to testing. The other three 
were comparison conditions, which involved no prior 

. The four experience conditions were aural 
ex] only (+A), aural plus meaning (A + M), 


experience only (+P) and print plus meaning (P 
+M). The three comparison conditions were (a) a base 
condition with CVCVCs (pseudowords) that had not 
been seen or heard before, (b) medium-frequency words 
(87 per million), and (c) high-frequency words (333 per 


y four average readers from the fifth grade par- 
ticipated in the vocalization task, 8 subjects per list. 
idees rni euh lang er 
Vor two lista, the mean VL was 1.43 sec, for two other 
lista the mean Vl, was 1.44 sec, and for one list it was 
1.45 sec. No item had a mean latency of less than 1.31 
sec or more than 1.52 sec. ‘The assignment of lists to 
conditions was then rotated through subjects, so that 
ach list occurred equally often in each condition. 

e m uit iei thet tevivel 
meaning. necessary to invent six conceptual 
deveriptions. These mimicked 


common 
nouns and included defining attributes and functions 
that the referent could serve. The defining attributes 


id) a kind of banana, red with no curved 

of frog, with smooth akin and no dM 
more than the essential definitions given 

here. Vor example, the banana was described as the 

primary crop of a certain island whose depended 

upon it for food and other things. for 

each concept, a color sketch depicting the object was 


Procedure 


‘The initial phase of the study extended 5 
eben Mane epetn cit ep. 
words and 2 days to collect vocalization and matching 


vocalization : 

latencies, repeating the order of the first tests. Each 
semion was tape-recorded, and a tally was kept of the 
number of times the pseudoword was heard and pro- 
duced by the subject. By varying the number of times 
a particular was repeated on the third day, 


THOMAS W. HOGABOAM AND CHARLES A. PERFETTI 


it was possible to assure that each pseudowo 
heard and produced a minimum of 15 times, 

the actual tallies varied slightly and ranged to 24 th 
Following are descriptions of the experience qu 


t 

P+M. In this condition, on the first day, sub 
were given a 3-page booklet with one pseudoword 
page. They were told to look at each word, listen 
the experimenter said it, repeat the word, and then 
ten carefully while the experimenter read the meanii 
‘The subjects were also shown a picture of the item wh 
the meaning was being given. This procedure was th 
repeated with a different ordering of these words, 
subjects were then asked if they knew the meanings, an 
the procedure was repeated if they did not, If d 
subjects indicated that they knew the meanings, 
were given a booklet (one word on a page) and askı 
read each aloud and give the meaning. Any errors if 
pronunciation or meaning were corrected. After p 
ducing the meaning when given the word, the 1 
were then asked to pick the correct word from a 
three when given a meaning. The experimenter 
duced a meaning ("This is the one that . , ."), 
subject was asked to pick the word that was app! 
and say it aloud, Again, any errors were correc 
‘This procedure was continued until the subject hii 
gone through two booklets, ] 
"This same general procedure was repeated on th 
following 2 days. Its essential features were that 
subject said the word, received feedback from the 
perimenter, and was required to both recall the wo 
given its meaning, and vice-versa. 

+P. In this condition on the first day, the subje 
was given a booklet and asked to look at each word am 
repeat it after the experimenter pronounced it, 
errors were corrected and the procedure was repeat 
‘The subject was then given another booklet and 
to read through it aloud. The experimenter cot 
any mispronunciations and usually repeated the 
‘This procedure was repeated up to five times per 


condition was similar to P + M except that nom 
experience was , 
A+Mand+A. These aural procedures pat 
the P + M and +P conditions, respectively, except 

no booklets were used. That is, inputs were , 
to the experimenter's producing the words and/or th 


Following 3 days of expeti 


vocalization latencies 
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seen nor heard before, plus 3 medium- and 3 high-fre- 
quency real words, The 21 items were randomly or- 
dered to construct one list and then reversed to 
a second list, each order seen by half of the sul 

On the day following the vocalization 


The task 
is referred to here as a matching task because there was 
a single response key to be depressed when 
bie i the same ("matched 


shutter opened and was ecd e subject's button 
press, 
Results 


An examination of the cell distributions 
indicates that in some instances the data 
were positively skewed, ‘To reduce the ef- 
fects of long reaction times, all latencies were 
converted to reciprocal latencies ( 


As positive skew was a problem in some 


Table 2 


723 


cases, however, the results reported below 
are based on speed scores, 


First Test, Vocalization Latencies 


The mean vocalization latencies for both 
skilled and less-skilled readers in all condi- 
tions are presented in Table 2. Each cell 
representa 30 latencies, 3 in each condition 
for 10 subjects in each reading-skill group. 

A five-factor analysis of variance was used 
to evaluate the data. The factors were IQ 
(five levels), reading skill (skilled vs. loss- 
skilled readers), condition (seven condi- 
tions), and word (three in each condition). 
The analysis revealed a significant effect of 
condition, F(6, 24) = 24.59, p < 001, and a 
effect of skill, 


Mean Vocalization Latencies ín Experiment 2 (in mec) 


Preudoword 


> 


First tent. 


Skilled 179 134 1% r1 
Less skilled 3.15 216 1^ 1n 1% un 
retest 


158 Lu Lu 


; 


um 
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meaning (F < 1). A comparison of the base 
condition with the training conditions 
showed that on the average, the base-con- 
dition pseudowords had longer VLs than the 
pseudowords that occurred in the experience 
conditions, F(1, 24) = 76.00, p < .001. The 
VLs for high-frequency real words were 
shorter than those for the medium-fre- 
quency words, F(1, 24) — 16.69, p « .001, and 
real words in general had shorter VLs than 
pseudowords, F(1, 23) = 24.30, p < .001. 
Additionally, one nonorthogonal comparison 
(Newman-Keuls) confirmed that the VL was 
longer in the base condition than in the 
meaningless aural-experience (+A) condi- 
tion (p <.05). That is, simply having heard 
and pronounced a pseudoword reduced 
vocalization latencies. 

The only other significant effects involved 
the IQ factor and its interactions. The main 
effect of IQ, F(1, 280) = 5.66, p < .01, was 
compromised by interactions with both sex, 
F(4, 280) = 17.80, p < .001, and reading skill, 
F(4, 280) = 17.07, p < .001, which in turn 
were compromised by an IQ X Sex X Read- 
ing Skill interaction, F(4, 280) = 15.50, p < 
001, Because this high-order interaction 
represents only one subject per cell, it is 
completely confounded with uncontrolled 
individual differences. An inspection of the 
pppropnaia means revealed no systematic 

rends. 


Matching Latency, First Test 


The mean matching latencies for the seven 
conditions are shown in Table 3. The 
analysis was the same as for vocalization 


Table 3 


Mean Matching Latencies in Experiment 2 (in sec) 


Pseudoword experience 
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latencies with all latencies first being trans- 
formed into speed scores. Again there was 
a significant effect of condition, F (6, 24) = 
4.81, p < .01, and no Reading Skill X Con- 
dition interaction (F < 1). There was also 
no main effect of reading skill (F < 1). A 
breakdown of the condition effect showed 
that having seen the pseudoword in print 
reduced matching latencies, F (1, 24) = 5.00, 
p <.05, and that, on the average, matching 
times for the experience conditions were 
shorter than for the base condition, F(1,24) 4 
= 16.68, p <.001. Again there was no effect "| 
of having a meaning for a word (F <1), and — 
there was no Meaning X Print interaction (F 
<1). The effect of high- versus medium- 
frequency words was in the same direction 
as vocalization latency but it did not reach 
the same level of significance, F(1, 24) = 2.7, 
p =.11, as was also true of the effect of words 
versus pseudowords, F(1, 24) = 3.57, p = 
071. The a posteriori comparison of the 
base condition with the +A condition 
showed that simply having heard and pro- 
nounced a word lowered the matching time 
(p <.05). In general, then, the same pattern 
of results emerged from both dependent 
measures. An inspection of the means in 
Table 3 shows that the pattern of means over 
the within-subjects conditions is approxi- 
mately the same for both measures, with the 
magnitude of reader differences being larger 
for vocalization latencies. 

Other significant effects were Sex X 
Reading Skill, F(1, 4) = 28.75, p « .01, and 
IQ X Reading Skill X Condition, F(24, 280) 
= 1.645, p < .05, and IQ, F(24, 280) = 1.645, 
p € 05. Note that there was no Sex X 


Words E- 


Medium High 


Group Base +A A+M +P P+M _ frequency — frequency 
4 First test 
Skilled 142 1.27 1.98 1.18 118 1.21 1.11 
Less skilled 1.63 1.27 1.48 1.27 1.34 1.40 1.26 
, 10-week retest 
gud 1.20 1.07 1.21 111 1.05 1.09 1.04 
ess skilled 1.37 1.27 1.25 1.28 1.20 1.20 121 


Note. = = i 
ote. +A = aural only; A + M = aural plus meaning; +P = print only; P + M = print plus meaning. 


ading Skill interaction in the VL measure. 
While there was virtually no reading skill 
effect for males, there was a large effect for 
- females, with skilled female readers being 
faster than less-skilled readers. This effect, 
ES did not interact with condition (F 
S 


ò Vocalization Latencies, Retest 


á 

The pattern of results after 10 weeks was 
essentially the same as for the immediate 
ò< test, although the effect of reading skill was 
somewhat larger, F(1, 4) = 26.98, p < .01. 
The mean VL for each condition is shown in 
Table 3. The effect of condition was the 
Y. same as for the first test. 


" 
| Matching Latencies, Retest 


As with the first test, the reading skill ef- 
fect did not reach significance F(1, 4) = 1.61, 
p <.27, and neither did the Reading Skill x 
Condition interaction (F <1). However, the 
effect of condition was significant, F(6, 24) 
1 6.32, p € .01, and therefore was further 
"examined by orthogonal contrasts as before. 
Again, pseudowords that had been seen were 
Æ matched faster than those that had not been 

! seen, F(1, 24) = 5.65, p < .05; the experience 
conditions were faster than the base condi- 
tion, F(1, 24) = 15.27, p « .001; and there 
was no effect of having a meaning (F < 1). 
In contrast to the immediate test, the effect 
of real words versus pseudowords was sig- 
nificant, F(1, 24) = 11.39, p « .01, reflecting 
shorter reaction time for real words. Addi- 
tionally, the Print X Meaning interaction 
was marginally significant, F(1, 24) 7 4.63, 
p 7.042. The four means (Table 3) seem to 
indicate that when the item was seen in 
| print, having a meaning decreased matching 

latency, but when a word had only been 
heard, meaning experience somewhat in- 
creased matching latency. The a posteriori 
' comparison of the base condition with the 
+A condition again showed that simply 
having heard a pseudoword reduced the 
matching time (p < .05). 


Discussion 


The results of Experiment 2 support sev- 
f eral conclusions and raise some questions. 
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The conclusions that seem to be supported 
arethese: (a) When experience with pseu- 
dowords is controlled, skilled readers have 
shorter VLs than less-skilled readers. (b) 
Experience merely in hearing (and saying) 
pseudowords leads to decreased VLs for 
these pseudowords. The effect of this ex- 
perience is independent of reading skill. (c) 
Experience in seeing and responding to a 
pseudoword in print reduces VLs more than 
aural experience with the pseudoword. (d) 
Whether with aural or print experience, 
meaning does not affect VL. (e) The mag- 
nitude of the experience effect on VL is 
substantial Print experience reduced VLs 
for two-syllable pseudowords to the extent 
that their VLs were not different from those 
of two-syllable real words. (f) The effects of 
experience on VL are enduring. Despite the 
meager 3-day experience, the effects of ex- 
perience persisted at least 10 weeks after the 
actual experience. 

As for differences between individuals of 
different reading skill, the most significant 
conclusion is that the effects of skill level and 
type of word experience on latency measures 
seem to be statistically independent. For 
vocalization latency there was no interaction 
between skill level and experience. This 
suggests that as far as vocalization speed is 
concerned, the type of experience that is 
good for the skilled reader is good for the 
less-skilled reader. Another conclusion with 
respect to reading skill is that the same 
amount of experience does not lead to the 
same final level of performance for the less- 
skilled reader as for the skilled reader. 
Gains from experience are comparable, but 
level of performance is not. 

The questions raised by the results of 
Experiment 2 have to do with the relation of 
decoding measures to reading skill. There 
are two facts to deal with. (a) The reaction 
time differences between skill groups were 
smaller for the matching task than for the 
vocalization task. While the differences 
always favored the skilled readers, these 
differences were not significant for the 
matching task. (b) The experience condi- 
tions had identical effects on both vocaliza- 
tion and matching latencies. Because the 
experience conditions affected matching 
latencies, it may be concluded that word 
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experience affected the recognition of let- 
ter-string units. However, reading skill 
differences were not large or significant for 
matching latencies. "Therefore, when reader 
differences are observed in a VL task, these 
differences cannot be attributed to letter- 
string recognition. It might be hypothe- 
sized, therefore, that the observed reader 
differences in VL may be mainly due to 
motor programming and/or some other 
postrecognition process. Perfetti, Finger, 
and Hogaboam (1978), however, have found 
that if vocalization tasks do not involve 
words, there is no difference between skilled 
and less-skilled readers. Thus, motor pro- 
gramming speed, per se, may not account for 
reader differences in vocalization tasks that. 
involve words. It is likely, then, that VL 
differences to words are due to some com- 
ponent other than motor programming that 
is not measured by matching latencies. One 
possibility would be the ability to easily 
utilize the phonetic codes that are involved 
in both normal reading and in the vocaliza- 
tion of single words. 


- 
Experiment 3 


While Experiment 2 demonstrated some 
striking effects of qualitative experience, the 
total amount of experience was fixed. Ex- 
periment 3 was designed to explore possible 
effects of varying the total quantity of ex- 
perience. Since Experiment 2 showed that 
providing meaning did not affect vocaliza- 
tion latency, Experiment 3 was concerned 
with varying the number of printed and 
aural exposures to pseudowords. 


Method 
Subjects 


Subjects were 12 third graders divided int 
of reading skill. The skilled group was Fete i 
percentile on the reading subtest of the Metropolitan 
Achievement Test, while the less-skilled group was 
below the 40th percentile. IQs were not matched, but 
all subjects had an Otis-Lennon score of at least.90. 


Materials and Design 


The design included three variables of th i 
interest: number of exposures (0, 3, 6, 12, i gio iy 
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ence modality (visual presentation vs. aural presen 
tion), and reading skill (skilled vs. less skilled). Thr 
pseudowords (CVCVCs) were randomly assigned 
each of the 10 cells defined by orthogonally vaii 
modality and number of exposures. This assignmen| 
defined the first material set. Two additional material 
sets were produced by randomly reassigning the pseu- 
dowords of the first material set, subject to the con- 
straint that across material sets the same item did not 
occur twice in the same cell. This created material set 
asa fourth factor. Three subjects in each reading-skill { 
level were randomly assigned to each material set, for 
a total of six subj each level of reading 
skill. 

Each pseudoword was don a white index card in 
lowercase elite type and photographed for 2 X 2 inch 
(5.08 X 5.08 em) slides. Additionally, slides of six real 
words, three high frequency and three medium fre- 
quency, were made. These same cards were retained 
and used in the print conditions. 

For the print conditions, three decks of index card 
were prepared, one deck for each material set. TI 
pseudowords appeared 1, 2, 4, or 6 times in each deck? 
so that viewing the decks 3 times would produce 
frequencies of 3, 6, 12, and 18. 


Procedure 


The experimenter worked with each subject indi 
vidually for about 10 minutes each day for 3 days. Oi 
the first day the subjects were told that they would no 
learn the meanings until later. Each day every subjec 
received both the visual and aural conditions. In the 
visual conditions, the subject was instructed to look at 
the word on each card as the experimenter presented 
it, listen to the pronunciation, and then repeat it. For 
the aural conditions the subject simply listened to the 
word and repeated it. Thus, the print and aural con- 
ditions were analogous to the +P and +A conditions of 
Experiment 2. ‘The subjects were not informed of the 
vocalization task until the fourth day, when vocalization 
latencies were collected on all pseudowords and on the 
medium- and high-frequency words. The apparatus 
and procedure wefe the same as in Experiment 1. 


Results 


Since one less-skilled subject failed to re- 
spond on the test of vocalization, data fo 
this subject and the skilled subject receivi? 
the same material set were discarded. DO 
analyses were then based on 10 subjects, 
per group, collapsed over material sets. The 
mean VLs are plotted in Figure 1. The 
mean VL for zero frequency represents sik 
pseudowords, three that were assignet to the 
Zero-frequency-aural presentation cell and 
three assigned to the zero-frequency-prin 


ere collected. Because the zero-frequency 
point is necessarily the same for both types 
of presentations, a single analysis would ar- 
ificially introduce an interaction effect. 
Therefore, two analyses were used, one for 
the aural conditions and one for the print 
conditions. All latencies were changed to 
speed scores for the analysis reported below, 
as the data were positively skewed. An 
analysis of the latency scores, however, rev- 
ealed the same pattern of results. 

For the aural conditions, the skilled 
readers were faster overall than the less- 
skilled readers, F'(1, 8) = 10.81, p < .01. 
There was also a significant effect of number 
of exposures, F(4, 32) = 4.82, p < .01, anda 
significant interaction between reading skill 
snd number of exposures, F(4, 32) = 3.63, p 
« 05. There was also substantial variability 
of subjects within each reading-skill level, 
F(8, 32) = 11.69, p < .01. Analysis of the 
print conditions showed that the same fac- 
tors significantly influenced the vocalization 
latencies. 

To determine if modality had any effect, 


50 
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the zero-frequency cells were eliminated and 
an analysis of variance was performed on the 
remaining data with modality as a factor. 
Visual exposure consistently produced 
shorter average VLs, but the difference was 
not significant, F(1, 8) = 3.27, p = .11. 
Modality also did not interact with reading 
ability, number of exposures, or any other 
interaction of factors. 

The average VLs for real words are shown 
in Figure 1. While the reading ability dif- 
ference was reliable, F(1, 16) = 11.62, p < 
.05, the apparent interaction was not. 


Discussion 


The results of Experiment 3 suggest that 
skilled readers benefit more from minimal 
aural exposure than do less-skilled readers. 
Both skilled and less-skilled readers ap- 
peared to benefit more from print experience 
than from aural experience although this 
difference did not reach statistical signifi- 
cance. For skilled readers, the rate of de- 
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crease in vocalization latency as a function 
of exposure frequency was comparable for 
print and aural exposure, whereas for less- 
skilled readers there appeared to be very 
little effect of minimal (three) aural expo- 
sures. This conclusion must be tentative 
because it rests on the assumption that the 
zero points for aural and print conditions 
were identical. However, this assumption 
is consistent with the fact that pseudowords 
were chosen to have equal preexperimental 
vocalization latencies. 

Given this caveat, however, the data do 
suggest the following. Skilled readers after 
3 exposures respond as quickly to pseudo- 
words as to high-frequency real words. 
However, since the real words had not been 
experienced in the experiment, pseudowords 
had greater recency than real words. For 
skilled readers, 18 aural exposures to pseu- 
dowords were equivalent to 18 printed ex- 
posures, producing latencies equal to high- 
frequency words. For real words, Figure 1 
shows a greater frequency effect for less- 
skilled readers than for skilled readers, thus 
replicating a frequency interaction found 
previously (Perfetti & Hogaboam, 1975). 

That skilled readers appeared to reach 

asymptote sooner than less-skilled readers 
may imply that further experience may re- 
duce the difference between skilled and 
less-skilled readers. A striking fact is that 
even after 18 print exposures, less-skilled 
readers were about 400 msec slower than 
skilled readers for pseudowords that the 
latter had never heard or seen. This allows 
the conclusion that skilled readers, although 
their skill is defined by a comprehension test, 
bring much greater decoding skill to their 
encounters with rarely experienced words 
than do less-skilled readers. 


General Discussion 


: These experiments have suggested some 
similarities and differences in how young 
readers' vocalization latencies for pseudo- 
words are affected by experience. On the 
one hand, skilled and less-skilled readers 
showed similar qualitative patterns. 
Hearing and saying the pseudowords led to 
shorter VLs for both skilled and less-skilled 
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readers, but neither group was affected by Se 
meaning experience. On the other hand, » 
there were some quantitive differences in the 
effect of these experiences on skilled and ^ 
less-skilled readers. All three experiments 
showed a very large difference in latency 
between skilled and less-skilled readers in 
the absence of any prior exposure to pseu- 
dowords, in agreement with Perfetti and 
Hogaboam (1975). Moreover, equal expo- 
sures did not lead to equal reaction times. 

The results of Experiments 2 and 3 are 
important in demonstrating no qualitative 
differences between skilled artd less-skilled 
readers. Note that qualitative differences 
are not beyond plausibility. For example, 
one might have expected less-skilled readers / 
not to benefit from aural experience. Exe 
periment 2 demonstrated the benefit of - 
limited aural exposure for less-skilled read- 
ers, and Experiment 3 found that less-skilled 
readers continued to benefit from increasing 
exposure frequency beyond the number of 
exposures beneficial to skilled readers. f 

There was no effect of meaning on either y 
reader group. The general principle seems?" 
to be that experiences with letter strings that! 
facilitate decoding for skilled readers benefit 
less-skilled readers as well. The kind of 
experience that is facilitative is provided 
even by aural encoding, although print ex- 
perience helps more. The extra benefit of 
prior print experience in decoding is pre- 
sumably in the identification of letter strings 
for translation into a phonetic string. The 17 
effect of prior aural encodings is presumably 
on the availability of a phonetic string once 
translation from the graphemic string has 
occurred. It is of interest to note that skilled 
readers were better both at what might be 
called “true decoding” and at “word recog- 
nition.” The former case is represented by 
a reader's first encounter with words preyi-. 
ously known only aurally and the latter by 
encounters with words previously seen in 
print. 

Another important conclusion from these 
studies is that subword decoding processes 
are implicated as a source of difference be- 
tween skilled and less-skilled readers. The . 
pattern of results from Experiment 1 show- 7 
ing number of syllables and real word- 
pseudoword interactions with reading skill 9^ 


‘suggests this interpretation. Additionally, 
the superiority of good readers when both 
quantity and quality of previous word ex- 
perience are equated suggests that decoding 
differences are not directly attributable to 
differential word experiences. 

The locus of decoding effects, however, 
remains an issue. It is not satisfactorily 
clarified by a comparison of the two decoding 
measures of Experiment 9. Furthermore, 
the fact that the matching latencies were 
always measured 1 day after the vocalization 
latencies introduces an unknown order ef- 
fect. Also, the many ways in which a subject 
can perform same-different judgments on 
letter strings does not help isolate the sup- 
posedly response-free component processes. 
Nevertheless, the hypothesis that a major 
source of individual differences in reading 
skill resides in the ability to quickly activate 
a phonetic code may be worth considering. 
This could mean that less-skilled readers 
have lower-quality phonetic representations 
for a given level of verbal experience. Or, 
closer to the usual notion of decoding, it 
could mean that the translation stage in 
which letter sequences are assigned phonetic 
codes is slower for less-skilled readers, even 
though initial grapheme identification is 
sufficiently rapid. Note that this last pos- 
sibility is compatible with the syllable in- 
teraction of Experiment 1 if, as Spoehr and 
Smith (1973; Smith & Spoehr, 1974) sug- 
gested, vocalic center groups, which are 
typically syllabic units, provide the input for 
translation. The present experiments, of 
. course, cannot attend to these possibilities 

and it may well be that more than one com- 
ponent of print-to-sound decoding is a 
source of sktll differences in comprehen- 
sion. 

Whatever the underlying process, the fact 
- that decoding speeds are rather easily af- 
fected by simple familiarizing experiences 
may be of practical value, especially since 
this effect lasts at least 10weeks. However, 
given our present knowledge, these results 
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may only be useful for an objective of in- 
creased decoding speed. These results are 
of unknown value for an objective of in- 
creased comprehension. 
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Sources of Vocalization Latency Differences Between 
Skilled and Less Skilled Young Readers 


Charles A. Perfetti, Elyse Finger, and Thomas Hogaboam 
University of Pittsburgh 


Vocalization latencies of skilled and less skilled young readers were found to 
be a function of set size, number of syllables, and stimulus material. Differ- 
ences between skilled and less skilled readers were absent for naming colors, 
digits, and pictures. Differences were found for words, and differences in- 
creased with number of syllables (and letters). While set-size effects were ob- 
served equally for skilled and less skilled subjects for colors and digits, only 
less skilled readers were substantially affected by set-size increases with 
words. Inefficiency in alphabetic verbal coding rather than use of informa- 
tion constraint or word retrieval seems to be the major source of reader differ- 


ences in vocalization latencies. 


Children skilled in reading produce 
shorter vocalization latencies to isolated 
printed words and pseudowords than chil- 
dren less skilled in reading (Perfetti & Ho- 
gaboam, 1975). However, in normal read- 
ing, word identification is aided by a variety 
of textual and other constraints. Accord- 
ingly, a pertinent question for the study of 
the component processes of reading assumed 
to be indexed by vocalization latencies is the 
role of vocabulary set size. As the possible 
universe of items increases in a vocalization 
or naming task, reaction times for at least 
some types of items increase (Morin, Konick, 
Troxell, & McPherson, 1965), although 
published evidence for printed words seems 
to be lacking. It is of interest to know the 
extent to which skilled and less skilled 
readers use this knowledge of vocabulary 
constraint to increase speed of decoding. 

A second question is whether reader dif- 
ferences in vocalization latency (VL) are 
confined to verbal material. Certainly, if 
differences in VL between skilled and less 
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skilled readers are found over a range of 
stimuli, factors more general than decoding 
or word recognition are implicated. In 
particular, differences in speed of name re- 
trieval from long-term memory would be 
implied by general differences in vocalization 
latency. 

Five experiments were carried out to ex- 
amine various aspects of these two questions. 
Our hypothesis (Perfetti & Hogaboam, 1975; 
Perfetti & Lesgold, in press) is that differ- 
ences in reading comprehension skill are in 
large part due to differences in the knowl- 
edge and use of verbal codes, including the 
extent to which such codes are automatically 
activated (LaBerge & Samuels, 1974). A 
derivate of this hypothesis is the prediction 
that differences between skilled and less 
skilled readers in VL will occur for verbal 
stimuli but not for colors, digits, and pic- 
tures. While the naming of these latter 
stimuli is a verbal response that requires 
name retrieval, the stimuli are nonalphabe- 
tic. By the same reasoning, set-size effects 
should be comparable for skilled and less 
skilled readers so long as the items are 
nonalphabetic. Furthermore, stimulus 
factors such as number of syllables should 
affect skilled and less skilled readers simi- 
larly so long as the items are nonalphabetic. 
These predictions are based on the as- 
sumption that such effects in nonalphabetic 
stimuli are mainly on processes other than 
verbal coding, that is, response selection or 
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response programming (e.g., Klapp, Ander- 
son, & Berrian, 1973; Theios, 1973). 


General Method 


Subjects 


Subjects were 16 skilled and 16 less skilled third 
graders, as measured by the Reading subtest of the 
Metropolitan Achievement Tests, Elementary Form H 
(MAT; Durost et al., 1970), which was administered by 
the research staff in the fall term. The skilled group 
included 10 females and 6 males. Mean Reading sub- 
test percentile for this group was 78.44 (SD = 13.92). 
Mean IQ, as measured by the Otis-Lennon Test of 
Mental Ability, Elementary Level I (Otis & Lennon, 
1967), which was administered by the school the pre- 
vious year, was 114.44 (SD = 10.31). The less skilled 
reader group included 9 females and 7 males. The 
- mean Reading subtest percentile was 15.15 (SD = 

12.47). IQ scores for 3 of these children were not 

available. The mean IQ for the remaining 13 subjects 

was 103.5 (SD = 8.29). 


Procedure 


For all experiments, stimuli were presented to 
subjects individually as 35-mm transparencies by a 
Kodak Carousel projector equipped with a solenoid- 
operated shutter. For verbal stimuli, lowercase elite 
type was used, and the transparencies were negatives 
(light on dark). For all tasks, the subject sat at a table 
approximately 30 cm for the viewing screen. A digital 
timer (Hunter Klockcounter) recorded the individual 
response times measured from the opening of the 
shutter to the onset of vocalization, which triggered a 
voice-operated relay. Experimental stimuli were al- 
ways preceded by several practice stimuli. Depending 
on the experiment, subjects were instructed to say the 
word, number, color, or picture name as quickly as 
possible. Instructions emphasized accuracy as well as 
speed and pointed out that the onset of vocalization 
terminated the appearance of a slide. 


Experiment 1: Colors 


Materials and Design 


For color naming, eight typical hues of blue, black, 
red, green, white, yellow, purple, and orange were pho- 
tographed. The eight samples were each represented 
by seven replications. 

The independent variable was set size, a within- 
subjects factor that assumed the two values of 2 and 8. 
For Set Size 2, the colors blue and yellow were used. 
For Set Size 8, the full range of 8 colors, including blue 
and yellow, were used. For Set Size 8, each color oc- 
curred 5 times for 40 trials. Two orders of the 40 slides 
were used, one the reverse of the other. Half of the 
subjects received each order. For Set Size 2, each color 
occurred 10 times for a total of 20 trials, again with two 
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random orders. Subjects participated in Set Size 2 on 
the first day and in Set Size 8 on a second day. 


Results 


The median vocalization latencies for each 
subject were the basic data for these exper- 
iments, thus reducing the impact of occa- 
sional long latencies. However, parallel 
analyses were routinely made of subject 
means and provided identical patterns of 
statistical significance in every case, with the 
exception noted in Experiment 4. The 
means of these individual medians are the 
data reported below and shown in Figure 
£ 

For the larger set size, the mean VLs were 
916 and 873 msec for skilled and less skilled 
readers, respectively. For the smaller set 
size, they were 721 and 703, respectively. A 
two-factor analysis of variance showed a 
significant effect of set size, F (1, 30) = 31.52, 
p <.001, but no effect of reading skill (F < 
1). There was no interaction of set size with 
reading skill (F < 1). 

A second analysis was done on just the 
blue and yellow stimuli, the two colors 
common to both set-size conditions. As with 
the analysis of the full data, set size was sig- 
nificant, F(1, 30) = 31.53, and reading skill 
was not (F < 1). 

The data for only blue and yellow may give 
amore valid picture of the effect of set size. 
When blue and yellow were the only colors 
the subjects were seeing, the mean VL was 
714. When blue and yellow were 2 of 8 
possible stimuli, the mean VL for blue and 
yellow was 866. This rather powerful saving 
of 152 msec as a result of reduced number of 
alternatives was observed for both levels of 
reading skill. 


Experiment 2: Digits 


Digits share with colors the property of 
being nonalphabetic, and like colors, digit 
names are well established for most children. 
They also provide an opportunity to examine 
the effects of large set size and number of 
syllables. It was of particular interest to 
examine syllable effects in digit vocalization 
as a control for such effects in word vocali- 
zation. Effects.of number of syllables on 
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digit VL have been previously reported 
(Klapp, 1971, 1974; Eriksen, Pollack, & 
Montague, 1970). Furthermore, set size can 
be manipulated with digits so that the large 
set size can be safely beyond the limitations 
of short-term memory. 


Materials and Design 


The design was a 2 X 2 X 3 mixed-factor design. The 
within-subjects variables were set size (3 or 100) and 
number of syllables (two, three, or four), while reading 
skill was the between-subjects variable. 

For the larger set of numbers, there was a subset of 
21 experimental items comprising 7 samples of two-, 
three-, and four-syllable numbers. Thus, the two-syl- 
lable numbers were 20, 30, 40, .. . , 90; the three-syllable 
numbers were 25, 35, , 95; and the four-syllable 
numbers were 27, 37, 47,..., 97. The reason for this 
is that controlling for number of digits at two leaves only 
numbers ending in seven to serve as four-syllable 
numbers and only numbers ending in zero to serve as 
two-syllable numbers (except for six “teen” numbers). 
Thus, to gain control over the initial syllable, which may 
be especially important for VL, the set of 21 experi- 
mental numbers was balanced for initial digit and varied 


in second digit. This list of 21 items was embedded in 
a larger list of 48 numbers. The remaining 27 included 
two three-syllable fillers, which sampled the remaining 
range between 20 and 99, and the six two-syllable 
numbers 13, 14, 15, 16, 18, and 19, which provided a 
comparison with the experimental two-syllable num- 
bers, which were necessarily of higher token frequen- 
cy. 
The reduced set of numbers was represented by 20, 
25, and 27. The reduced set was presented as blocks 
with blank slides randomly interspersed. For a block 
of trials, the subject had only one possible stimulus; if 
the stimulus was absent, there was no response. With 
Set Size 1, within a block of trials, the reduced-set 
condition was essentially a simple reaction time ex- 
periment. This procedure allows a clear interpretation 
of any syllable effect. With only one number possible, 
any observed syllable effect would have to be in re- 
sponse preparation. There were 10 trials for each 
number within a block. Block order was randomized. 
Subjects had the reduced-set condition on the first day 
and the larger set on the second day. 


Results and Discussion 


Median latencies of each subject for each 
stimulus-type were the basic data. Again, 
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~ analyses using means and medians produced 


the same pattern. Mean VL for skilled 
readers was 821 msec, and for less skilled 
readers, mean VL was 854 msec, F(1, 30) = 
1.39, p > .20. Mean VL for experimental 
items from the large set was 957 msec com- 
pared with 718 msec for the reduced set, F(1, 
30) = 102.29, p <.001. While the main ef- 
fect of number of syllables was significant, 
F(1, 30) = 7.75, p € .001, the syllable effect 
was confined to the larger set conditions, 
producing a significant Set Size X Number 
of Syllable interaction, F(1, 30) = 10.94, p < 
001. For the reduced set, mean VLs were 
720, 715, and 718 for two-, three-, and four- 
syllable numbers, respectively; the corre- 
sponding latencies were 893, 980, and 998 for 
the larger set. Neither stimulus effect in- 
teracted with reading skill, Reading Skill x 
Set Size, F(1, 30) = 1.39, p > .20, and Read- 
ing Skill X Number of Syllables (F « 1). 

The effects on vocalization latency of 
number of syllables to be vocalized agree 
with Klapp (1971, 1974) and Klapp et al. 
(1973). As Klapp et al. (1973) suggested, 
such effects reflect processes preparatory to 
vocalization rather than perceptual analysis. 
The present data further suggest that these 
effects may be partly a matter of response 
selection from long-term memory, since no 
syllable effects were found in the simple re- 
action time situation of reduced set size—a 
situation that keeps all possible responses 
active in working memory. 


Experiment 3: Pictures and Words 


Reader skill differences in vocalization 
latencies to printed words and pseudowords 
have been previously established (Perfetti 
& Hogaboam, 1975). The preceding ex- 
periments can be interpreted as ruling out 
the possibility that such differences are in 
the strictly response components of vocali- 
zation. To strengthen the conclusion that 
identification of words or letter strings is the 
source of reader differences, another com- 
parison is desirable. In particular, it is 
possible that the source of vocalization la- 
tency differences to words is due to the re- 
trieval of word-production information, that 
is, the phonological form of the word and its 
articulation program. Experiment 3 pro- 
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vides a test of this possibility by comparing 
VL to pictures of objects with VL to words 
that named the objects. In the two cases, 
the subject produces the same word. Only 
the stimulus is different. 


Design and Materials 


Forty-two objects with common unambiguous names 
served as stimuli. Line drawings of the objects were 
photographed for the picture condition, and the names 
of the objects were printed in lowercase and photo- 
graphed for the word condition. The objects repre- 
sented a wide semantic range, for example, kite, ham- 
mer, and giraffe. 

The words were all nouns controlled for printed fre- 
quency according to the Carroll, Davies, and Richman 
(1971) norms for third grade. The mean relative 
frequencies (relative to 840,857) were 32.6 for one-syl- 
lable words (rage, 1-78), 31.1 for two-syllable words 
(range, 11-90), and 32.1 for three-syllable words (range, 
1-83). The mean number of letters was 5.08 (range, 
4-6), 6.64 (range, 5-8), and 8.00 (range, 5-11) for one-, 
two-, and three-syllable words, respectively. Some 
multisyllables were word compounds (e.g., football). 

Since the same stimuli were used in both word and 
picture vocalization, half of each subject group got 
pictures first and half got words first. Thus, task order 
and reading skill were between-subjects variables, and 
task (picture vs. word) and number of syllables (one, 
two, or three) were within-subjects variables. 


Procedure 


The basic vocalization procedure was used. How- 
ever, prior to data collection, each subject was shown 
cards containing the pictures and asked to name 
them. 


Results and Discussion 


An analysis of variance on median VLs 
indicated significant main effects of reading 
skill, F(1, 14) = 27.16, p < .001, task, F(1, 28) 
= 10.18, p = .007, and number of syllables, 
F(1, 28) = 17.29, p < .001. There was no 
significant effect of task order (F — 1.16). 
However, there were significant interactions 
of Reading Skill X Task, F(1, 14) = 30.73, p 
< .001, and Reading Skill x Number of 
Syllables, F(1, 14) = 5.16, p = .01, and Task 
X Number of Syllables, F(1, 28) = 7.32, p < 
01. Furthermore, the three-way interaction 
of Reading Skill X Task X Number of Syll- 
ables was significant, F(2, 28) = 3.79, p = .03. 
The interaction of Reading Skill X Task x 
Task Order was marginally significant, F(2, 
28) = 3.50, p = .08, suggesting the possibility 
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that less skilled readers were helped some- 
what more than skilled readers on the word 
task when the word task had been preceded 


by the picture task. »Except for this last 
mentioned margingiteraction, the pattern 
of significance was feplicated when subject 
scores were means instead of medians. 

As far as the syllable effect is concerned, 
the results suggest that even when print is 
not involved, producing more syllables 
slightly increases VL. For skilled readers, 
there was about a 60 msec increase from one- 
to two- and three-syllable picture-names. 
Although larger, this is in line with the sig- 
nificant syllable effect found by Klapp et al. 
(1973) for adult subjects. 

The key findings, howeveryare those in- 
volving reading skill. Skilled arf? less skilled 
readers shówed comparable syllable effects 
for pictures, the task for which such effects 
can be attributed to response programming. 
However, as indicated by the Skill X Task X 
Number of Syllables interaction, skilled 
readers showed less of an increase for words 
compared with less skilled readers. Skilled 
readers increased from 995 to 1,051 to 1,079 
as number of syllables increased from one to 
three. Less skilled readers had corre- 
sponding means of 1,336, 1,434, and 1,668. 
The magnitude of the increase for skilled 
readers was 84 msec (8%) compared with 332 
msec (25%) for less skilled readers. 

The most striking comparison is the pic- 
ture-word difference. For skilled readers, 

words were slightly faster to vocalize than 
the corresponding pictures. However, for 
less skilled readers, words averaged 372 msec 
longerthan pictures. This result implicates 
word coding—not merely name retrieval—as 
a source of difficulty for the less skilled 


reader. 


Experiment 4: Categories 


The experiments so far have suggested 
that in addition to item recognition, vocali- 
zation latencies have a response selection 
component and a response programming 
component and that skilled and less skilled 
readers do not differ in selection or pro- 
gramming. However, the selection variable 
has only been tested with colors and num- 


bers. 
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For reading words, there are two relevant 
aspects of response selection. One is the size 
of the semantic category to which the word 
belongs. For example, the sets of animal 
and proper names are both indefinitely large 
or open and the sets of the four seasons and 
the twelve months are definitely small or 
closed. The other relevant aspect of re- 
sponse selection is whether specific items are 
activated in memory. While the entire set 
of animals, or proper names, or even the 12 
months cannot be all simultaneously active 
in short-term memory, a small number of 
specific instances, say four or five, can be. 
Of the four categories mentioned, only the 
four seasons seem a likely candidate for a 
semantic set in which both the category and 
its individual members can be kept active in 
memory. Thus, these categories provide 
comparisons of interest for Experiment 4 in 
which the potential set size (category size) 
and the actual set size (number of items) are 
independently examined. 


Materials and Design 


Four item sets were produced: Seasons (fall, sum- 
mer, winter, and spring) and months (all 12) provided 
closed sets with items of known frequency and number 
of syllables. Thus, two open sets were produced to pair 
with the fixed sets. Animals (mouse, deer, tiger, and 
kitten) were paired with seasons, and proper names (12 
of them) were paired with months. Both paired cate- 
Eories were equal with respect to number of syllables, 
but only months and names could be equalized in 
need stare 
_ For each category, all items occurred twice, but all 
ine occurred once before any occurred a second time. 

us, there were 24 vocalization trials for months and 
pope nin e 8 trials for seasons and animals. The 

r e four categories w: i i 
sigt subjects for each aaa oon 

e experiment was treated as a 2 X 4 X 2 X 2 
(Reading Skill Groups X Task Order X Actual Set Size 
X Potential Set Size) factorial design, with two be- 
tween-subjects factors. Of course, by this design, the 
actual-set-size and potential-set-size variables were 
represented by the specific categories described. Actual 


Set Sizes were 4 or 12, and y 3 
or opad! , and potential set sizes were closed 


Procedure 


The standard vocalization 
4 £ proced 
subjects were informed prior to nn ik 
what category they would see. The Proper names were 
described as names of boys and girls, 
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Results and Discussion 


Skilled readers showed shorter median 
VLs (774) than less skilled readers (947) as 
expected, F(1, 24) = 10.89, p < .005. Task 
order was not significant as a main effect or 
interaction (all Fs < 1). 

Both actual set size and potential set size 
were significant, F(1, 24) = 8.85, p = .007, 
and F(1, 24) = 10.93, p = .003, respectively. 
However, there was a significant interaction 
of Actual X Potential Set Size, F(1, 24) = 
16.31, p <.001. Whether a set was open or 
closed was important only for the larger set 
size. Thus, the means were about equal for 
4 seasons (840 msec), 4 animals (826 msec), 
and 12 months (834 msec), and longer for 12 
names (941 msec). 

Reading skill interactions were observed 
with actual set size, F(1, 24) = 7.48, p = .01, 
but not with potential set size (F « 1). 
However, the three-way interaction of 
Reading Skill X Actual Set Size X Potential 
Set Size was also significant, F(1; 24) = 5.17, 
p =.03. In effect, skilled readers were un- 
affected by actual set size for both closed and 
open sets. Less skilled readers were affected 
by increasing set size for open sets but not for 
closed sets. Thus, the differences between 
skilled and less skilled readers were in- 
creased with an increase in set size; the dif- 
ferences were especially large for proper 
names, the large open set. : 

There is a problem inherent in testing 
fixed semantic categories because two cate- 
gories having necessary and identical prop- 
erties are hard to find. The results of Ex- 
periment 4 must be interpreted with caution 
because only one category represented each 
level of the independent variable. 


Comparative Results and General 
Discussion 


The five panels of Figure 1 summarize the 
vocalization latency data from the four ex- 
periments. The most important result is 
that only in tasks.involving words (last two 
panels) were there significant differences 
between skilled and less skilled readers. 
Furthermore, less skilled readers were more 
affected by number of syllables than skilled 
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readers, but only when words were present- 
ed. If response preparation and decoding 
are both affected by n er of syllables, 
then the comparison of words with pictures 
and digits certainly sug "gsts that differences 
between skilled and less skilled readers in 
the effect of syllable length are in the de- 
coding stages. Since multisyllabic words in 
Experiment 3 contained more letters than 
monosyllabic words, the decoding interpre- 
tation is strengthened.! 

The picture-word comparison further 
seems to rule out some general name re- 
trieval deficit by the less skilled reader. The 
digit vocalization latency bears on this pos- 
sibility to sesie extent, but compared with 
digit names, picture names are lexically more 
interesting To the extent that naming 
pictures of objects involves the retrieval of 
words on the basis of semantic (perceptual) 
cues, word retrieval does not seem to be the 
main problem of the less skilled reader. 
Rather, it is the access to these words from 
print that is difficult. 

Set-size effects in vocalization latencies 
are sometimes not observed. While the 
present results were large effects of set size 
for colors and digits, Morin and Forrin 
(1965) found no effect of set size on the digit 
naming speed of third-grade subjects. 
Studies of adults have also failed to find 
set-size effects for digits (Theios, 1973; 
Brainard, Irby, Fitts, & Alluisi, 1962). The 
main difference between those earlier studies 
and the present one appears to be the ex- 
tremely large set size of the present study. 
For example, Morin and Forrin used set sizes 
of 2, 4, and 8 compared with a set size of 100 
used in the present study. Furthermore, the 
digits of Morin and Forrin were the set of 
1-8, so that even the largest set comprised a 
set presumably ‘possible to keep active in 
working memory. A set size of 100 should 
rule out the possibility of keeping the entire 


1Since number of syllables and number of letters are 
necessarily correlated in English, it is difficult to sepa- 
rate their relative contributions to decoding measures. 
To vary one while holding the other constant runs the 
risk of confounding frequency of graphemic patterns. 
Thus, while there is some evidence showing that when 
number of letters is controlled, number of syllables is 
not important (Frederiksen & Kroll, 1976), the issue is 
probably not yet closed. 
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set active in working memory. However, the 
set-size effects of the color task demonstrate 
that set-size effects do not depend on large 
set sizes. The effect of increasing the 
number of alternatiyes from 2 to 8 (a two-bit 
increase) was about 180 msec. Morin et al. 
(1965) found a comparable increase over the 
same range (one to three bits) of about 160 
msec for adult subjects. Thus, the present 
results confirm the importance of number of 
response alternatives for colors and suggest 
that for digits, similar effects are to be found 
provided the set size is large enough. 

For questions concerning reading, the ef- 
fect of set size on word vocalization latencies 
is of more interest. The rolewof context in 
identifying words is partly to constrain the 
possible lexical choices confronting the 
reader. Then, surely knowing that a word 
to be encountered will be 1 of 4 should fa- 
cilitate identification of the word compared 
with knowing that the word could be 1 of 12 
or 1 of an indefinitely large set. Again, the 
limitations on generalizations from single 
semantic categories must be kept in mind. 
Given this qualification, the present results 
suggest that subjects were no faster at 
reading words from a small closed set (the 
four seasons) than from a small open set 
(animals). Indeed, animal names were 
vocalized slightly faster than seasons (36 
msec) by less skilled readers. The smallest 
difference between skilled and less skilled 
readers (about 100 msec) occurred for the 
animals, providing evidence of the ability of 
less skilled readers to use vocabulary con- 
straints to aid word identification. 

Whether the word to be identified could 
be 1 of 4 or 1 of 12 had little effect on the 
skilled reader’s latency. There was, how- 
ever, a 62-msec difference between the closed 
category (months) and the open category 
(names), which were matched on printed 
frequency and number of syllables. The 
theoretical explanation for this is that while 
12 is a large number relative to 4, the entire 
set of 12 months can be simultaneously ac- 
tivated in memory for purposes of this task. 
For a category as large as “names of boys and 
girls," this is not possible. Notice that it is 
on the indefinitely large set that the less 
Skilled reader shows the largest latencies, 
that is, a mean increase of 152 msec over the 
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12 months. Although it is always difficult 
to rule out differences in exposure frequency 
to words, names of children seem to be a pool 
of items not especially likely to favor skilled 
readers over less skilled readers from the 
same school and neighborhood. 

With respect to the use of information 
constraints in word identification, readers 
were aided by vocabulary constraints even 
in the large open-set (proper names) condi- 
tion. Although this comparison involves 
different experiments with different mate- 
rials, it is interesting that relative to the 
words of Experiment 3 (see the fourth panel 
of Figure 1), latencies were much shorter (see 
the fifth panel of Figure 1) when the subject 
could expect people's names than when the 
subject did not know what to expect. 

The comparative effects of materials 
across the four experiments may also be ex- 
amined by considering how well the several 
latency measures predict reading achieve- 
ment. Although two groups of subjects were 
identified, the wide range of reading 
achievement Scores within both skill 
groupings makes correlation and multiple 
regression reasonable procedures. With raw 
scores of the MAT Reading subtest as the 
measure of reading achievement, each of 
seven reaction time measures from four ex- 
periments were correlated with reading 
achievement and with each other. The 
mean of the reading achievement measure 
was 21, with a range of 6-40. 

The seven vocalization measures were the 
subject medians for the following tasks: 
restricted digits and unrestricted digits 
(Experiment 2), restricted colors and unre- 
stricted colors (Experiment 1), pictures 
(Experiment 3), unrestricted words (Ex- 
periment 3), and restricted words (Experi- 
ment 4). For the case of restricted words, 
the median across the two closed-set condi- 
tions (months and seasons) of Experiment 
4 were used. Thus, they contrast with the 
words of Experiment 3 in that their category 
membership was known in advance, and the 
category set was closed. i 

The results of the simple correlational 
analysis are shown in Table 1. The pattern 
of simple correlations among vocalization 
latency measures shows low to moderate 
correlations. Correlations between unre- 
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stricted words and restricted words (r = .50), 
unrestricted words and pictures (r = .38), 
restricted words and pictures (r = .47), and 
unrestricted words and unrestricted digits 
(r = .54) exceeded the .05 level of signifi- 
cance. (However, several other correlations 
were around .30, a reliable correlation by a 
one-tailed test.) Considering that the 
measure in each case was latency of vocali- 
zation, the striking fact is that correlations 
not different from zero were obtained in 
many cases. 

With respect to reading achievement, the 
highest correlation was latency to unre- 
stricted words (r = —.58), while the correla- 
tion between restricted words and reading 
achievement was the second highest (r = 
—.45). Of the nonverbal tasks, only latency 
to unrestricted digits correlated significantly 
with reading achievement. Thus, consistent 
with the results of the separate experiments, 
the correlations suggest that measured 
reading skill is highly related to speed of re- 
actions to printed words and less related to 
speed of reaction to nonalphabetic stimuli, 
except digits. The digit correlation, not 
present as a group difference in Experiment 
2, is presumably a consequence of using the 
full range of reading achievement. 

It is interesting to consider the predict- 
ability of reading achievement from vocali- 
zation latencies, taking into account the in- 
tercorrelations of the latencies. Although 
the number of subjects is somewhat small (N 
= 32) relative to normal sample size consid- 
erations, there are theoretical reasons for 
examining the multiple regression of some 


of the reaction time measures onto reading 
achievement. In particular, if verbal pro- 
cessing speed is central to reading, then la- 
tency measures to words should predict 
reading achievement better than measures 
to nonword stimuli. Furthermore, insofar 
as skilled readers have developed better 
context-free coding, reaction to items from 
unrestricted sets should predict better than 
items from restricted sets. Pictures involve 
name retrieval but not alphabetic decoding. 
Accordingly, while reactions to pictures 
might predict reading achievement as well 
or better than reaction to unrestricted digits, 
since both involve retrieval from long-term 
memory, they should be less predictive than 
reactions to words. Finally, while there is a 
vocalization onset component to all of these 
tasks, in the simplest situations of restricted 
digits and restricted colors, the words to be 
vocalized are available as preprogrammed 
items in a temporary memory buffer. 

Tn short, it is theoretically reasonable to 
assume that tasks of vocalization are ordered 
in the degree to which they make demands 
on the processes of identifying words and 
retrieving the motor program(s) associated 
with producing the words. This ordering is 
as follows: (1) nonalphabetic and no long- 
term memory retrieval, (2) nonalphabetic 
and long-term memory retrieval from a se- 
lected set, (3) nonalphabetic and unre- 
stricted long-term memory retrieval, (4) al- 
phabetic and limited long-term memory re- 
trieval, and (5) alphabetic and unrestricted 
long-term memory retrieval. These task 
features correspond roughly to the tasks of 


"Table 1 
Correlation Matrix for Seven Vocalization Latency Measures and Reading Achievement 
Variable 1 2 3 4 5 6 7 8 

1. Reading achievement — bt. —b* 0 —.29  —03  —38* 17 
2. Unrestricted words (Experiment 3) 21 .50* .22 .38* 10 .54* 04 
3. Restricted words (Experiment 4) = .07 M7* S18 .23 .23 
4. Restricted colors (Experiment 2) — -06 .30 HT 31 
5. Pictures (Experiment 3) = .13 08 10 
6. Restricted digits (Experiment 1) NS A7 19 
7. — —13 


. Unrestricted digits (Experiment 1) 
8. Unrestricted colors (Experiment 2) 


Note, For a one-tailed test, the p = .05 level isr = 
reliable (p > .05). 
*p < .05 by (two-tailed test). 


.30. Differences between pairs of (significant) nonzero correlations are not . 
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Table 2 . 
Multiple Regression of Theoretically Ordered 


Predictor Variables on Reading 


Multi- 
ple In- 
corre- crease 
Additional la- in 
Variable feature tion r? 
Restricted digits 03  .00 
Unrestricted Name retrieval :38 .14* 
digits 
Pictures Larger retrieval 46 .07 
space 
Restricted words Alphabetic code .54  .09* 
Unrestricted Name retrieval .62  .09* 
words 
Note. With all nonword variables partialed out, the correla- 


tion of reading with restricted words is —.33 and with unre- 
stricted words is —.42, With all other variables partialed out, 
the correlation between reading and unrestricted words is 
—.35. 

*p <.05. 


restricted digits, unrestricted digits, pictures, 
restricted words, and unrestricted words, 
respectively. Accordingly, the median 
latencies for these five tasks were the pre- 
dictor variables for a stepwise multiple re- 
gression analysis. (The two color tasks were 
not included because, as Table 1 shows, they 
did not have nonzero correlations with the 
dependent measure.) The variables were 
entered according to the theoretical ordering 
suggested above, starting with restricted 
digits and ending with unrestricted words. 
The results are shown in Table 2. 

Although the ordering shown in Table 2 is 
not the only one possible, it is theoretically 
motivated. Furthermore, this ordering 
maximizes the possibility of showing sig- 
nificant multiple correlations for the 
nonalphabetic stimuli, given that simple 
correlations with reading achievement were 
highest for the two word tasks. In that 
sense, this is a conservative procedure with 
respect to the theoretical argument. At 
minimum, it demonstrates that even when 
correlations between all other vocalization 
latency measures and reading achievement 
are partialed out, significant correlations 
remain for word vocalization. The correla- 
tion was —.33 between restricted words and 
reading achievement and —.42 between un- 
restricted words and reading achievement 
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after partialing out all correlations of reading 
achievement with nonalphabetic tasks. The 
significance of retrieval from a relatively 
unselected verbal long-term memory is re- 
flected by the fact that even after all other 
correlations with reading achievement were 
partialed out, including restricted words, 
there remained a correlation of —.35 between 
unrestricted words and reading achievement. 
At the other end, it is interesting that read- 
ing achievement can be predicted from digit 
latency, provided the set of digits is not 
greatly restricted. "The theoretical inter- 
pretation is that rapid retrieval of name 
codes from long-term memory is a significant 
part of reading skill. 

With respect to digit speed, Spring and 
Capps (1974; also Spring & Farmer, 1975) 
reported significant differences between 
normal and dyslexic readers in the time 
taken to name a row of typed digits. Inthe 
present research, such reader differences 
were detectable only in the correlational 
analysis. Differences in naming digits may 
be more pronounced between skilled readers 
and severe dyslexics. Our less skilled 
subjects were about 1 year below grade level 
in reading, while the subjects of Spring and 
Capps (1974) were more than 2 years below 
grade level. Furthermore, the Spring and 
Capps (1974) dyslexic group was nearly as 
far below grade level in arithmetic as in 
reading. 

In this context, the significance of the 
present results is the demonstration that the 
low levels of reading skill found in a normal 
classroom are largely associated with coding 
problems specific to alphabetic inputs. In- 
deed, there may be differences in general 
name retrieval processes between normal 
readers and the extreme dyslexic (Denckla 
& Rudel, 1976). However for the run-of-the 
mill range of reading skill, the present results 
suggest that coding processes, over and 
above general name retrieval, are a major 
factor. 

, Although the locus of decoding latency 
differences remains to be identified, the 
present research more firmly establishes the 
general components that are responsible. 
First, at one level, the possibility that voca- 
lization latency differences are response ar- 
tifacts is clearly ruled out. Second, the 
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*« possibility that name retrieval differences 


de 


are the major factor is ruled out. Third, the 
ability of the less skilled reader to use con- 
straining knowledge as well as the skilled 
reader is established. The persistent dif- 
ferences between skilled and less skilled 
readers in reaction times to words and 
pseudowords seem to be due to processes of 
verbal coding, including processes operating 
on subword units. 
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Averaging Model of Information Integration Theory 
Applied in the Classroom 


Elsie Simms 
University of Arizona 


The two purposes of this study were to evaluate the applicability of the aver- 
aging model of the information integration theory to attitude formation in a 
classroom situation and to compare the results of an in-classroom and an in- 
laboratory application of equivalent stimuli. The in-classroom application 
consisted of 11 lectures spanning 31/2 weeks. The in-laboratory application re- 
quired 1 hour and used printed stimulus material. The stimuli were descrip- 
tive information concerning outstanding American women. The results con- 
firmed both the information integration theory and the averaging model for 
both experiments. No difference was found between the results of the two 
treatment methods, which suggests that the averaging model of information 
integration theory could be utilized to determine the necessary classroom 
learning experiences to accomplish attitudinal objectives. 


One of the most prominent of the infor- 
mation-processing theories is the informa- 
tion integration theory of attitude change 
developed over the past one and one-half 
decades by Anderson (1959, 1962, 1964, 1970, 
1971). This theory of attitude change pro- 
poses that individuals average new infor- 
mation they receive with their prior attitude, 
thus producing an attitude change. The 
theoretical model can be written as a 
weighted sum: 


R=C+ Zuis;. 


The R usually represents the overt response 
but could be considered the underlying at- 
titude. The C allows for an arbitrary zero in 
the response scale and is a constant. The 
summation is over any prior attitude and all 
informational stimuli relevant to the atti- 
tude object. The s represents the scale value 
of the informational stimuli along a favor- 
able-unfavorable continuum, and the w 
represents the weight or psychological im- 
portance of the respective piece of informa- 
tion. The averaging model requires the 
weights to sum to one. 

Anderson and his colleagues have applied 
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the theory of information integration to 
various judgmental and attitude change 
situations. Early work involved tests of the 
model for opinion change (Anderson, 1959), 
impression formation (Anderson, 1962), and 
clinical judgment (Anderson, 1972). An- 
derson’s averaging model has received in- 
creasing experimental support through these 
many judgmental tasks. 

Sawyers and Anderson (1971) tested in- 
formation integration theory in attitude 
change, using paragraphs of varying favor- 
ableness as stimuli. Their results sup- 
ported the linear model. A similar study 
(Anderson, 1973) confirmed the averaging 
model of information integration and ex- 
tended previous findings of social judgment 
(Anderson, 1971). In the report of this latter 
study, Anderson expressed the hope that the 
laboratory results would have relevance to 
the practical problems of attitude formation 
in the classroom. 

Educational studies of the communication 
content of classroom materials have usually 
centered around the theories proposed by 
Janis and Hovland (1959) on the persuasive 
content and conclusion drawing. Only An- 
derson's (1973) application of information 
integration theory to attitudes about United 
States presidents concerned itself with an- 
alyzing and measuring the specific effects of 
content and the process under which this 
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tudes. 

However, laboratory and classroom ap- 
plications of stimuli in attitude formation 
may differ in time span, discounting, and 
possible inoculation effects. Laboratory 
applications usually consume only a few 
hours, whereas most classes span a number 
of weeks. This study attempted to explore 
the application of the averaging model of 
information integration to attitude forma- 
tion in a classroom situation and to compare 
the results of an in-classroom application 
with those of an in-laboratory application of 
stimuli. The stimuli are assumed to be 
equally weighted, and therefore the theory 
predicts parallelism within a factorial design. 
It is predicted that (a) the averaging model 
of information integration theory can be 
demonstrated in a classroom situation that 
spans 31, weeks and that (b) there will be no 
difference between the results of the in- 
classroom application of the stimuli and the 
results of the in-laboratory application of the 
stimuli. 


Method 


The subjects received sequences of descriptive in- 
formation concerning an outstanding American woman 
and then rated the woman on admirability. The de- 
scriptive information was given by printed material for 
the in-laboratory experiment and by oral lectures for 
the in-classroom experiment. There were eight se- 
quence conditions (Table 1), which varied in compli- 
mentary value, H (high), M (medium), or L (low). Each 
subject served in all eight conditions for both experi- 
ments. 


Design 


"The same design was used for the in-laboratory and 
the in-classroom experiments. There were eight dif- 
ferent sequence conditions, 16 outstanding American 
women, 8 for each experiment, and eight ordinal posi- 
tions for the application of the eight sequences received 
by each subject. An8 X 8 Greco-Latin square was used 
to provide balance for these three variables. Six 
subjects were randomly assigned to each row of this 
Greco-Latin square. Thus subjects received informa- 
tion about each woman equally often with each se- 
quence type and at each ordinal position. This allowed 
each subject to serve as her own control. The inde- 
pendent variable was the favorableness of the infor- 
mation. 

The first six sequences of four informational pieces 
in Figure 1 form a 2 X 3 factorial design and were used 
to test the parallelism prediction. The row factors are 
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the first pairs of information, HH and LL. The column 
factors are the second pairs of information, HH, MM, 
and LL. The last two sequences of two informational 
pieces were used to test the averaging versus adding 
prediction. 


Materials 


The stimuli consisted of information from biogra- 
phies and historical writings concerning 16 outstanding 
American women. Ten paragraphs, four of high favor- 
abless, two of neutral favorableness, and four of low 
favorableness for each of eight of these women were 
used in the in-laboratory treatment. Ten 11-minute 
lectures, in the same proportions of favorableness, were 
constructed for the eight remaining women and were 
used in the in-classroom treatment. The 16 women 
were Jane Addams, Eugenie Anderson, Clara Barton, 
Frances Payne Bolton, Carrie C. Catt, Dorothea Dix, 
Hetty Green, Olveta Culp Hobby, Clare Booth Luce, 
Patsy Mink, Frances Perkins, Jeannette Rankin, 
Florence Sabin, Maria Sanford, Margaret Sanger, and 
Carey Thomas. Three paragraphs, one of each level of 
favorableness, will illustrate the type of material 
used. 

Frances Payne Bolton (high). Conservative but 
open-minded and capable of independence, articulate, 
and dynamic, she was one of the better congresswomen. 
She left her mark on federal legislation. Although she 
mostly voted loyally with her party, she took at times 
a remarkably independent course. One writer credits 
her with poise, self-assurance, intelligence, energy, and 
an amazing memory. She was also generous with her 
personal wealth. In 1929 she provided a gift of 
$1,250,000 for a school of nursing at Western Reserve 
University; later she developed an interest in the pres- 
ervation of our country's historical sites and invested 
generously in their acquisition and upkeep. She started 
her committee assignments as a member of the Indian 
Affairs Committee, but later she came to the prestigious 
House Foreign Affairs Committee. In her first 14 years 
she proposed scores of bills, many of them concerning 
social, educational, and industrial matters and some 
concerning public hygiene (Loth, 1968). 

Maria Sanford (neutral). It was characteristic of 
Maria Sanford that she gave talks to teachers on 
neatness and order. She had the New England Puritan. 
belief that cleanliness is next to godliness. She taught 
that neatness of person brings carefulness of morals, a 
very naive idea. She gave the teachers other Puritan 
sentiments, such as the ideas that goodness of nature 
is better than beauty of face, and that it is important not 
to be sentimental, and that one should avoid the habit. 


HH MM Bi 00 
HH HHHH HHMM HHLL HH 
LL LLHH LLMM CECL Mis 


Figure 1. Sequence types. (H = high favorableness, 
M = moderate favorableness, and L = low favorable- 
ness.) 
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of reading either trashy or “goody” books and instead 
store the mind with beautiful things. She urged them 
to give more attention to the useful than the useless in 
dress, advice that she herself adhered to so intensely 
that in later years her students complained of the un- 
attractiveness of her appearance. One of her many 
mottos for her own guidance at this time was “Do 
something steadily: Forty years studying birds” 
(Bernard, 1964). 

Hetty Green (low), Hetty always swore that she was 

very fond of her Aunt Sylvia Ann, but you would never 
have dreamed it to see the two together. Hetty, at the 
age of 25, stormed about her aunt’s house, ordering the 
servants around, bullying the poor invalid herself, and 
quarreling with everybody. If Aunt Sylvia wanted to 
hire a carriage, Hetty told her it was a sinful extrava- 
gance. If Aunt Sylvia had guests to dinner, Hetty would 
insist on borrowing somebody’s maid for the occasion 
instead of hiring a butler. Already she thought of her 
Aunt Sylvia’s fortune as her own, and she was not going 
to see it wasted. At last there was an explosion—and 
she and her aunt parted forever—when her aunt made 
plans to have a wing added to the house. Hetty stormed 
and wept, and flung herself down on the floor like a 
naughty child. The aunt packed up her things and left 
for their country home; Hetty was not allowed to go 
along. Hetty and her father went to New York to live. 
There she spent her time worrying for fear her angry 
aunt would change her will and leave her money to 
somebody else (Boyden & Moore, 1935). 


Procedure 


The in-laboratory experiment was a replication of 
Anderson’s (1973) study of information integration 
theory applied to attitudes about United States presi- 
dents, with paragraphs about outstanding American 
women rather than the presidents. Each subject read 
eight booklets containing either two or four paragraphs 
describing behavior of one outstanding woman. After 
the instructions, each subject received a four-page 
booklet with one paragraph per page. Subjects were 
allowed 1 minute to read each page, and at that time, the 
experimenter said, “Turn.” When the last paragraph 
was read, the experimenter said, “Rate.” The rating 
scale was attached to the back of each booklet, Im- 
mediately after finishing each booklet, the subjects 
rated the women on a 1-to-10 scale of admirability. The 
booklets were collected, the next sequence was dis- 
tributed, and the procedure was repeated until all eight 
Tow combinations were used. 

‘ For the in-classroom experiment, lectures were de- 
livered to six subjects at a time. Each subject heard 
eight 3- or 6-minute lectures, Thus, each subject heard 
all sequences about all of the women, with one woman 
and one sequence type for each lecture. Two like de- 
scriptions were given at each session. One minute was 
allowed for questions or discussion following the lecture. 
After a Sequence series was completed, the 1-to-10 
rating scale of admirability was distributed, marked by 
the subjects, and collected. Two days elapsed between 
lectures on like favorableness of a woman. The inde- 
pendent variable of interest was the favorableness of the 
lecture units. The lecture treatment required a period 


of 34 weeks. The subjects in the in-classroom experi- 
ment were not restrained from discussing the lectures 
between sessions. 


Subjects 


"The subjects were 48 female junior and senior edu- 
cation major undergraduates at the University of Ari- 
zona. Each subject was paid $10 for participating in the 
entire experiment. 


Results 


The information integration model pre- 
dicts parallelism within a factorial design 
with equally weighted stimuli. To test the 
model, I applied a Subjects X Treatment X 
"Treatment analysis of variance to a 2 X 3 
factorial design of the sequenced paragraphs 
and of the sequenced lectures. This paral- 
lelism is demonstrated for each of the two 
treatments in Figures 2 and 3. The Row X 
Column interaction was used to test the 
model and was applied separately to the in- 
laboratory and the in-classroom rating re- 
sults. 

The interaction for the in-classroom 
treatment was not significant, F(2, 94) — 
1.68, nor was the interaction significant for 
the in-laboratory treatment, F(2, 94) = 2.53. 
These Row X Column interactions are exact 
statistical tests of parallelism. These results 
confirming the parallelism prediction for the 
two experimental conditions give added 
support to the theory of information inte- 
gration. 


Averaging Versus Adding Prediction 


To test the averaging hypothesis of in- 
formation integration in each of the treat- 


Table 1 
Classroom and Laboratory Response Means 
and Standard Deviations for the Various 


Sequences 


Sequence 
Condition HHMM HH LLMM LL 
Classroom 
M 7.90 8.81 5.63 3.44 
SD 1.51 1.33 2.07 1.64 
Laboratory 
M 7.81 8.35 5.15 4.33 


SD 1.65 119 2.26 2.26 


Note. H = high; M = medium; L = low. 
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LOW NEUTRAL HIGH 
HHLL=606 HHMM=789 HHHH-906 
(SD=2.18) (SD= 1.49) (SD- 1.06) 


LLLL-454  LLMM-563 LLHH=752 
(SD= 2:336) (SD= 204) (SD=1.88) 
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Figure 2. In-classroom mean judgments of the general 
admirability for the six sequences of information. 


ment groups, I used four Subject X Treat- 
ment analyses of variance tests. These 
compare the responses to the HHMM se- 
quence with the responses to the HH-only 
sequence and compare the responses to the 
LLMM sequence with those to the LL-only 
sequence. Two analyses of variance were 
performed for the in-laboratory treatment, 
and two analyses of variance were performed 
for the in-classroom treatment. Table 1 
presents the means and standard deviations 
of the responses to the four pieces of infor- 
mation in each treatment situation. 

The averaging hypothesis predicts that 
the inclusion of neutral information, MM, 
should lower the response to favorable in- 
formation, HH, and raise the response to 
unfavorable information, LL. An adding 
hypothesis predicts that the inclusion of 
neutral information either will have no effect 
or will increase the response to favorable 
information and lower the response to un- 
favorable information. Table 1 presents the 
results of the averaging process. A signifi- 
cant difference was found for both treatment 


groups in the responses to the HH stimuli 
and the responses to the HHMM stimuli: 
in-classroom, F(1, 47) = 11.04; in-laboratory, 
F(1, 47) = 3.81. A significant difference was 
also found for both treatment groups in the 
comparison of responses to the LL stimuli 
and responses to the LLMM stimuli: in- 
classroom, F(1, 47) = 31.82; in-laboratory, 
F(1, 47) = 6.07. 


In-Laboratory Versus In-Classroom 
Application 


To test the in-classroom versus the in- 
laboratory treatments, I used a Subject X 
Treatment analysis of variance. No signif- 
icant difference between the ratings resulted 
from the two methods used to supply the 
information utilized by the subjects in for- 
mulating their ratings of the outstanding 
American women, F(1, 47) = 1.03. 


Discussion 


The difference between the in-laboratory 
and the in-classroom treatments was not 


LOW NEUTRAL HIGH 
HHLL-G54 HHMM=781 HHHH-894 
(SD- 1.81) (SD-191) (SD = 97) 


LLLL-4.21 LLMM:-515  LLHH-738 
(SD 1.92) (SD= 1.98) (SD = 1.96) 
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Figure 3. In-laboratory mean judgments of the general 
admirability for the six sequences of information. 
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significant even at the B. p 
suggests that there is a strc A oes 
the information integration theory of a 
tude formation is a robust theory and is as 
applicable to attitude learning in the class- 
room as it is to attitude learning ina labo- 
ratory experiment. However, it is recom- 
mended that further research be undertaken 
to determine whether the effects of uncon- 
trollable variables present in natural settings 
would prevent direct utilization of the in- 
formation integration theory in measuring 
and planning learning experiences for af- 
ive outcomes. 

CE experiment was designed to cover 
only a third of the time required in the usual 
college course, and each lecture took less 
than 5 minutes. Furthermore, the stimulus 
material was selected for simplicity and for 
the improbability that the subjects would 
have either prior opinions about, or emo- 
tional investment in, the stimulus material. 
"Therefore, it is reasonable to assume that 
either more complex stimulus material or 
longer time-lapse intervals between lectures 
may help clarify some of the pitfalls of 
applying laboratory findings to the class- 
room situation. 

Within a school situation, one is seldom 
interested in influencing the students' per- 
ception of an individual. Ideals and values 
are much more frequently the target of af- 
fective learning objectives, for example, the 
desirability of federally guaranteed health 
care versus individually financed health care, 
or the desirability of a controlled economy 
versus Smith's (1776) invisible-hand prin- 
ciple. 

pu. area that deserves attention is the 
problem that may arise when an ideal is 
evaluated on more than one dimension. 
Stimulus information may also contribute 
differentially to such multidimensional 
evaluations. Investigating the utility of the 
information integration theory to explain 
such processes may contribute extensively 
to understanding the teaching and learning 
of attitudes within the educational system. 

Two final points from this study should be 
noted. (a) The subjects favored an averag- 
ing rather than an adding process of infor- 
mation integration in formulating their at- 
titudes. Therefore, educators of young 
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adults must not assume that small amounts 
of information designed to change affective 
outcomes will have a noticeable impact since 
that information is evaluated, weighted, and 
then averaged with all prior knowledge 
pertinent to the attitude object. (b) Neither 
reinforcement nor feedback was given to the 
subjects to influence their ratings of the 
outstanding American women. This fact 
tends to emphasize the impact produced by 
the meaningful content of information in 
attitude formation and the integral relation 
between the cognitive and affective compo- 
nents of attitudes. 
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Verbal and Pictorial Elaboration Effects on Children’s 
Long-Term Memory for Noun Pairs 


Daniel W. Kee and Trisha Beuhring 


University of Southern California 


The effects of verbal and pictorial elaboration on children’s long-term memory 
were assessed in two experiments. Second-grade children learned a 20-pair 
list of common nouns to a 16/20 correct criterion by the paired-associate meth- 
od, and long-term retention was assessed after 7 days. Experiment 1 was con- 
ducted with 64 middle-socioeconomic-status white children. Experiment 2 
was conducted with 60 low-socioeconomic-status Mexican-American children. 
The results of both experiments indicate that although elaboration facilitates 


the initial acquisition of noun pairs, 


it neither helps nor hinders their long- 


term retention. A manipulation of labeling mode (English vs. Spanish) in the 


second experiment did not influence eit 


In children’s paired-associate learning 
of noun pairs, a comparison is frequently 
made between standard and elaborated 
presentation. This comparison can be made 
in either the verbal or pictorial mode. In the 
verbal mode, standard presentation consists 
of presenting the to-be-remembered (TBR) 
nouns alone (e.g., the chain-the bowl) or 
connected by a conjunction (e.g., the chain 
and the bowl), whereas elaborated presen- 
tation consists of presenting the nouns con- 
nected by a preposition or verb (e.g., the 
chain inside the bowl). Standard presen- 
tation in the pictorial mode consists of de- 
picting the referents of the TBR nouns side 
by side, whereas elaborated presentation 
consists of depicting the referents engaged 
in a spatial interaction (e.g., a picture ofa 
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her rate of acquisition or forgetting. 


chain sitting inside a bowl). The typical 
outcome of such comparisons is that elabo- 
rated presentation facilitates children's 
paired-associate learning relative to stan- 
dard presentation (Davidson & Adams, 1970; 
Kee, 1976; Kee & Rohwer, 1973, 1974; Roh- 
wer, Kee, & Guy, 1975). 

Current theory (cf. Anderson & Bower, 
1973; Paivio, 1971; Rohwer, 1973) suggests 
that elaborated presentation prompts the 
encoding of shared referential meaning for 
the otherwise disparate pair members. The 
memory unit for pairs encoded in this man- 
ner is held to be more cohesive, thereby in- 
creasing pair-member availability relative to 
pairs encoded under standard presentation. 
Research has focused primarily on the 
short-term benefits of this type of coding. 
An impressive number of task and subject 
conditions have been surveyed. The liter- 
ature consistently demonstrates elabo- 
rated-presentation effects on the initial ac- 
quisition of noun pairs (cf. Rohwer, 1973). 

Research concerning the effects of elabo- 
rated presentation on children's long-term 
retention has been minimal. Some studies 
have suggested that elaborated presentation 
is associated with less forgetting after re- 
tention intervals of 48 hours (Rohwer, 
Ammon, Suzuki, & Levin, 1971) and 1 week 
(Kerst & Levin, 1973; Reese, 1975). These 
findings, however, are not conclusive because 
the degree of learning was not equated be- 
tween the experimental conditions in these 
studies prior to the retention interval. 
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Thus, the retention effects observed may 
simply reflect differences in the number of 
items initially acquired, rather than differ- 
ential forgetting (cf. Underwood, 1964). 
Original learning was nominally equated 
in a study by Reese and Parkington (1973) 
that assessed the effects of elaborated pic- 
torial presentation on the 7-day retention of 
noun pairs with 4- and 5-year-old children. 
In order to equate original learning, the 
children were required to learn the paired- 
associate list to a 100% performance criterion 
by a multichoice test procedure. Some re- 
tention benefits were observed for elabo- 
rated presentation. Their results, however, 
may also be inconclusive. The problem is 
that unmeasurable differences in the 
strength of items can occur above the 100% 
performance criterion. Such differences 
would favor elaborated presentation as op- 
posed to standard presentation because the 
100% criterion allows for substantial over- 
learning trials relative to lower performance 
criterions. These overlearning trials are 
known to result in larger gains for subjects 
in fast as opposed to slow learning conditions 
(cf. Postman, 1974; Underwood, 1964). The 
likelihood of detecting overlearning differ- 
ences between conditions is increased by the 
use of a partial performance criterion. It 
would have been more desirable, therefore, 
to have controlled original learning at a cri- 
terion below 100%. Evidence also suggests 
that the multichoice test can differentially 
affect the strength of items at the conclusion 
of acquisition by serving as an extra study 
trial (cf. Postman, Jenkins, & Postman, 
1948). Thus, degree of learning in the Reese 
and Parkington study was probably not ad- 
equately controlled, thereby mitigating a 
Successful assessment of the effects of elab- 
orated presentation on long-term memory. 
A study by Olton (1969) deserves recog- 
nition. Olton assessed the effects of preex- 


posure to nouns embedded within a printed 
rene context on the 7-day retention of 
fth-grade children. Original learning was 


Controlled by requiring the subjects in the 
xxr Preexposure conditions (sentence 
is vs. control “number” context) to 
en the list to a predetermined number of 
tals. His results indicated that although 
Preexposure to printed elaborated presen- 
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tation facilitated original learning, no dif- 
ferences between the elaborated and control 
conditions were observed on the 7-day re- 
tention test. Some other studies have also 
failed to detect elaborated-presentation ef- 
fects on long-term memory with children 
(Reese, 1970, 1972) and adults (Forbes & 
Reese, 1974). However, original learning in 
these latter studies was either not controlled 
or nominally equated at a 100% performance 
criterion. 

In summary, research consistently dem- 
onstrates elaborated-presentation effects in 
the initial acquisition of noun pairs. Evi- 
dence concerning their effects on long-term 
retention, however, is limited and incon- 
clusive. Thus, the present study was con- 
ducted to provide a more extensive analysis 
of elaborated-presentation effects on chil- 
dren’s long-term memory for noun pairs. 

The degree of learning in this study was 
controlled by requiring the subjects to reach 
an 80% performance criterion. Assessment 
of the number of correct responses on the 
criterion trial provides an indication of the 
actual levels of learning achieved in the dif- 
ferent experimental conditions prior to the 
retention interval. The experimental ma- 
nipulations included in this investigation 
were designed to (a) extend Olton’s analysis 
of verbal elaborated presentation to the 
aural mode of presentation, (b) provide a 
direct comparison between the two methods 
of elaborated presentation frequently used 
to facilitate paired-associate learning in 
childhood: verbal and pictorial, and (c) to 
assess the generality of the results. Exper- 
iment 1 addressed the first two of these ob- 
jectives. In the experiment, subjects were 
presented with the TBR pairs in both the 
verbal and pictorial modes, while the verbal 
and pictorial context of pair presentation 
(standard vs. elaborated) was varied facto- 
rially. This type of bimodal stimulus pre- 
sentation has been used in a number of pre- 
vious investigations of elaborated-presen- 
tation effects on children's paired-associate 
learning (e.g., Davidson & Adams, 1970; Kee 
& Rohwer, 1973, 1974; Reese, 1965, 1970). 
Furthermore, it provides for an estimate of 
both verbal- and pictorial-elaboration effects 

between conditions that have been equated 
for modes of stimulus information. 
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ELABORATION EFFECTS ON CHILDREN’S MEMORY 


Experiment 1 


Method 


Design and subjects. The basic design of the ex- 
periment consisted of a 2 X 2 X 2 crossed factorial with 
verbal presentation (standard vs. elaborated), pictorial 
presentation (standard vs. elaborated), and subject's 
sex (male and female). This basic design was aug- 
mented to include the within-subjects factor of trials 
(one to-three) in the retention phase of the experi- 
ment. 

Subjects in the experiment were 64 second-grade 
children. Their modal age was 7 years. The children 
attended school and resided in a middle-socioeco- 
nomic-status (SES) white community neighboring Los 
Angeles, California. Census tract information indicated 
that the children's community had a median educa- 
tional level of 13.8 years and a median income in excess 
of $18.000. 

Material and procedures. A 20-pair list of common 
nouns was assembled. All of the nouns were selected 
from first- and second-grade reading primers and were 
highly familiar to second-grade children (e.g., cow, tie, 
glasses, table). Line drawings of the noun referents 
were prepared and photographed onto 35-mm slide 
transparencies. Standard verbal presentation consisted 
lof the presentation of the noun labels connected by a 
conjunction (e.g., the chain and the bowl), whereas 
elaborated verbal presentation consisted of the noun 
labels connected by a preposition (e.g., the chain inside 
the bowl). Standard pictorial presentation consisted 
of the depiction of the noun referents side by side, 
. whereas elaborated pictorial presentation consisted of 

the depiction of the object referents engaged in a spatial 
interaction (e.g., a picture of a chain inside a bowl). It 
will be recalled that the experiment was designed so that 
all subjects would be presented with the TBR pairs in 
both the verbal and pictorial modes, while the verbal 
and pictorial context of pair presentation was varied 
factorially. Thus, a subject in the standard verbal- 
elaborated pictorial presentation condition, for exam- 
ple, would hear the labels of the nouns connected by a 
conjunction, in addition to seeing the pictorial presen- 
tation of the referents engaged in a spatial interaction. 
A total of nine random orders of the list were con- 
structed. An arrangement of six of these orders was 
used in thé acquisition phase of the experiment, which 
allowed for three alternating cycles of study and test. 
The final three list orders were used for test trials in the 
retention phase. : 

Subjects were tested individually in a room at the 
participating school. Testing was conducted by a white 
female. Subjects were seated at a small table on which 
a slide-screen projection unit was located. A study-test 
paired-associate procedure was used during acquisition. 
The subjects were informed that 20 pairs of nouns would 
_be presented and that they should learn them in such 
a way as to be able to. produce the name of one member 
of the pair when presented with the other. Pictorial 
presentation was made by a 35-mm Kodak slide pro- 
. jector. The rate of presentation on study trials was 4 
` sec per pair. As each pair was presented visually, à 
cassette recorder (Wollensak #2551), synchronized 
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with the slide projector, presented the verbal informa- 
tion in an appropriate manner (i.e., standard or elabo- 
rated). The test-trial rate was also 4 sec per pair. On 
the test trial, the subjects were presented with one 
member from each pair (presentation was both verbal 
and pictorial) and were asked to verbally recall the as- 
sociate. Subjects were required to learn the pairs to a 
criterion of 16/20 correct in the acquisition phase. A 
strict scoring procedure was used in which only re- 
sponses that matched experimenter-provided labels 
were accepted as correct. This provision was made in 
order to standardize the establishment of criterion 
performance across subjects. It will be recalled that in 
the construction of the paired-associate lists for ac- 
quisition, provisions were made for three alternating 
cycles of study and test. If a subject required additional 
practice to reach criterion, the set of three study-test 
trials was recycled in sequence. 

Subjects were required to return after 7 days for the 
retention test. This test consisted of three cued-recall 
trials. Stimulus cue presentation was both verbal and 
pictorial. The test-trial rate was subject, paced; how- 
ever, if a subject failed to provide a response within 10 
sec, the stimulus cue was advanced. No feedback was 
provided to the subjects concerning whether or not their 
responses were correct on these test trials. 


Results and Discussion 


The results are reported in two parts: (a) 
acquisition and (b) retention. Unless 
specified otherwise, the Type I error rate was 
set at .05. 

Acquisition. Table 1 presents the mean 
number of trials to criterion as a function of 
verbal and pictorial presentations. As can 
be seen, both elaborated verbal and elabo- 
rated pictorial presentations facilitated 
noun-pair learning, F(1, 56) = 19.17; F(1, 56) 
= 16.66, respectively. The interaction be- 
tween the two factors was also significant, 
F(1, 56) = 9.26. Descriptively, the form of 
the interaction indicates that the combined 
elaborated verbal-elaborated pictorial pre- 


Table 1 l 3 
Mean Number of Trials to Criterion as a 
Function of Presentation Condition in 


Experiment 1 


- Pictorial Verbal presentation 

presentation — Standard Elaborated M 

Standard 6.18 3.00 4.57 

Elaborated 3.13 2.56 2.85 
M 4.63 2.78 


Note. MS,(56) = 2.84. 
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Table 2 T : 
Mean Number Correct on the Criterion Trial 
as a Function of Presentation Condition in 


Experiment 1 


Pictorial Verbal presentation 
presentation Standard Elaborated M 
Standard 17.06 17.13 17.10 
Elaborated 17.56 17.56 17.56 

M 17.31 17.35 


Note, MS,(56) = 1.59. 


sentation did not enhance performance rel- 
ative to elaborated verbal or elaborated 
pictorial presentation alone. The factor of 
sex and its interactions with the manipulated 
factors was not associated with significant 
effects (all Fs < 1). 

Criterion-trial performance was assessed. 
in order to determine if original learning was 
successfully equated prior to the retention 
interval. Both strict and lenient scorings of. 
criterion-trial performance were made. The 
strict procedure counted correct only those 
responses that were identical to the labels 
provided in acquisition, whereas the lenient 
procedure also accepted synonyms. A pre- 
liminary analysis indicated that the two 
measures produced identical patterns of 
performance across the experimental con- 
ditions. Thus, only the analysis of the len- 
ient scoring procedure will be treated be- 
cause it affords the more sensitive index of 
the number of pairs acquired. The means 
for the number of correct responses given on 
the criterion trial as a function of verbal and 
pictorial presentations are presented in 
Table 2. As can be seen, the conditions are 
equivalent. Analysis of variance failed to 
detect any reliable source of variance (p > 
.10). 

Retention. Loss scores were computed 
for each subject based on the number correct 
on the criterion trial (lenient scoring) minus 
the number correct on each of the retention 
test trials (lenient scoring). The means for 
the conditions collapsed over the factor of 
subject’s sex are presented in Table 3. 

Analysis of variance with repeated mea- 
sures revealed that verbal presentation, F(1, 
56) = 1.47, pictorial presentation, F(1, 56) = 
-98, and the interaction between these two 
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factors, F(1, 56) = 1.21, were not associated EY | 
with significant effects. This outcome in- 
dicates that neither of the methods of elab- 
oráted presentation (i.e., verbal or pictorial) 

that customarily facilitate noun-pair learn- 

ing have an effect on the long-term retention 

of noun pairs when the degree of original 
learning is equated.! 

Only two sources of significant effects 
were detected in the analysis of loss scores. S= 
A trials effect was observed, F(2, 112) = 
21.07, that indicated that the number of 
items lost (4.92, 4.31, 4.11) declined over the 
three test trials. This finding suggests that 
the context of original learning may have | 
been reinstated by the repeated testing, 
thereby increasing the probability of correct 
response selection and retrieval. Finally,a = 
four-way interaction was observed between 
the factors of Verbal Presentation, Pictorial 
Presentation, Subject’s Sex, and Trials, F(2, 
112) = 3.92. The form of this interaction, 
however, did not serve to qualify previous 


t- 


~ 
conclusions drawn about the effects of verbal 
and pictorial presentations. Thus, this in- 
teraction will not be treated in further de- 
tail. 
Both extralist and within-list intrusions i 


were tabulated. The rate of extralist in- 
trusions was extremely low (M = .30 of an 
item). Analysis of variance failed to detect 


! Although the assessment of the number of correct 2 
responses given on the criterion trial of acquisition in- 
dicated that the experimental conditions had been 
successfully equated prior to the retention interval, it 
is possible that this index was not sensitive to variations 
in degree of overlearning. In the experiment, over- 
learning would probably have introduced a bias favoring 
the standard presentation condition, because subjects 
in this condition required about twice as many study- 
test cycles to reach the 16/20 performance criterion as 
the subjects in the elaborated conditions. Thus, the 
benefit of elaborated presentation on long-term memory 
may have been obscured by the method used to equate 
originallearning. In order to assess this possibility, an 
analysis of loss scores was conducted for items that had 
only been recalled once during the acquisition phase of 
the experiment. This conditional analysis should serve 
to eliminate most of the pairs overlearned in the stan- 
dard presentation condition. An analysis of variance 
was performed, and it failed to reveal any reliable dif- 
ferences between the experimental conditions, largest 
F(1, 56) = 1.93, p > .10. Thus, there is no evidence to 
support the proposition that differential overlearning 
serves to mask elaborated-presentation effects at re- 
tention. 
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"Table 3 
Mean Loss Scores as a Function of Presentation Condition and Trial in Experiment 1 
Trial 1* Trial 2 Trial 3 
Pictorial Standard ^ Elaborated ^ Standard ^ Elaborated Standard ^ Elaborated 
presentation Verbal Verbal Verbal Verbal Verbal Verbal 
Standard 5.44 5.00 4.44 5.06 444 4.50 
Elaborated 3.75 5.50 3.00 4.15 2.94 4.56 


Note. MS,(56) = 26.17, MS,(112) = .54. 
“Trial 1: MS,(56) = 9.87. 


any reliable source of variance. The mean 
rate of within-list intrusions was 1.73 of an 
item. Analysis of variance revealed only one 
significant effect: trials, F(2, 112) = 3.88. 
This effect indicates that the number of 
within-list intrusions (2.03, 1.69, 1.47) de- 
clined over the three test trials. This out- 
come is consistent with the notion that rep- 
eated testing in the retention phase serves to 
reinstate the context of original learning. 
Thus, the number of misplaced responses 
(i.e. within-list intrusions) would be ex- 
pected to decline as the subject became more 
accurate at correct response selection and 
retrieval. 

The results of this first experiment suggest 
that neither type of elaborated presentation 
(i.e., verbal or pictorial) that typically facil- 
itates noun-pair learning in childhood is 
associated with a retention effect when de- 
gree of learning is controlled. Subjects in 
this experiment were drawn from an ele- 
mentary school serving a predominantly 
white middle-SES community. Some pre- 
liminary evidence concerning the generality 
of the present findings to a different popu- 
lation of children is provided by a replication 
that was conducted with 64 second-grade 
Mexican-American children. These chil- 
dren resided and attended school in a low- 
SES community in Los Angeles, California. 
A 2 X 2 factorial design with verbal presen- 
tation (standard vs. elaborated) and pictorial 
presentation (standard vs. elaborated) was 
used in this experiment. With some minor 
exceptions, the methods and procedures 
were identical to those described for Ex- 
periment 1. The results of this study dif- 
fered from those observed in Experiment 1 
in that elaborated verbal and elaborated 
pictorial presentations facilitated both ac- 
quisition and retention of noun pairs. "Table 


4 presents the loss scores as a function of 
verbal and pictorial presentations collapsed 
over trials for the low-SES Mexican-Amer- 
ican children. A decisive interpretation of 
these retention results, however, is mitigated 
by the discovery of marginally significant (p 
< .10) variations in the degree of original 
learning, which favored the elaborated- 
presentation conditions. These variations 
were observed under a lenient scoring of 
criterion-trial performance in which Spanish 
equivalents and synonyms were counted as 
correct responses in addition to the nominal 
English response labels provided during 
acquisition. 

The minor variation in original learning 
detected in this preliminary study with 
low-SES Mexican-American children may 
be attributed to the method used to establish 
criterion-trial performance in conjunction 
with the language ability of the children. In 
order to provide for a direct comparison with 
Experiment 1, criterion-trial performance 
was established under a strict scoring pro- 
cedure that accepted only the experimen- 
tally provided English labels as correct. 


Table 4 

Mean Loss Scores as a Function of 
Presentation Condition for Low-SES 
Mexican-American Children 


Pictorial Verbal presentation 
presentation — Standard Elaborated M 
Standard 7.79 6.38 7.09 
Elaborated 6.21 5.23 5.72 

M 7.00 5.81 


Note. MS,(60) = 16.27; SES = socioeconomic status. Loss 
scores were computed for each subject based on the number 
correct on the criterion trial (lenient scoring) minus the number 
correct on each of the retention-test trails (lenient scoring). 
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This procedure, although highly successful 
with the predominantly monolingual mid- 
dle-SES white children in the first experi- 
ment, appeared to have underestimated the 
rate of learning for the predominantly bi- 
lingual Mexican-American children who gave 
many responses in Spanish throughout ac- 
quisition. "This underestimation of learning 
resulted in differential overlearning trials in 
the experimental conditions, hence the dif- 
ference in degree of originallearning. For 
example, a lenient rescoring of trials to cri- 
terion revealed that 1896 of the Mexican- 
American children would have reached the 
16/20 criterion one trial earlier if Spanish 
equivalents had been accepted as correct 
responses. Seventy-three percent of these 
children were in the elaborated conditions. 


Experiment 2 


In order to provide a more decisive as- 
sessment of the generality of the elabo- 
rated-presentation results observed in Ex- 
periment 1, a second experiment was con- 
ducted with a population of low-SES Mexi- 
can-American children. In this experiment, 
a lenient scoring procedure was used 
throughout acquisition in order to assure a 
sensitive index of the number of items 
learned. The primary factor in the experi- 
ment was presentation condition, which 
consisted of a comparison between standard 
pictorial and elaborated pictorial presenta- 
tions. Limitation in the number of children 
available precluded an assessment of both 
types of elaborated presentation (i.e., verbal 
and pictorial). The selection of the pictorial 
mode was based on two considerations. 
First, the results from the earlier studies 
indicated that the retention effects are 
identical for the two types of elaborated 
presentation. Second, the pictorial mode 
minimizes the role of language, thereby 
providing a more robust manipulation of 
elaborated presentation in a population 


whose members vary substantially in lan- 
guage ability. 


Method 


Materials and procedures. The materials and 
procedures used were identical to those described for 
the first experiment with the additions to be discussed. 
Similar to the first experiment, subjects were required 
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to reach a 16/20 correct criterion by the cued-recall test 
procedure in acquisition. In order to control for re- 
sponse availability, verbal labels were provided for the 
TBR referents on the study trials. Half of the children 
in each experimental condition were provided with the 
labels in English; half were provided with the labels in 
Spanish. The English labeling replicates the presen- 
tation conditions of the first experiment. A comparison 
between the two labeling conditions will indicate if ei- 
ther rate of learning or retention varies as a function of 
language mode. As previously discussed, a lenient 
scoring procedure was used throughout acquisition 
which accepted correct responses in either English or 
Spanish as well as synonyms of the nominal response 
labels. The experimenter, who was conversant in 
Spanish, was the same white female who tested subjects 
for the first experiment. 

Similar to Experiment 1, retention was assessed after 
a 7-day interval by the administration of the three 
cued-recall test trials. In addition, a fourth test trial 
was administered that consisted of a multichoice test. 
For this test, subjects were provided with a response 
booklet that consisted of 20 pages (one per stimulus 
cue). The booklet was constructed so that (a) each page 
contained 10 different response pictures, (b) each re- 
sponse picture occurred 10 times throughout the book, 
(c) no response picture appeared twice on a page, (d) the 
correct response appeared on the appropriate page, and 
(e) the correct response appeared an equal number of 
times in each of the 10 possible positions on the page. 
Subjects were required to designate, by a pointing re- 
sponse, the correct associate for each stimulus cue 
presented on this multichoice test. This test procedure 
obviates the response-availability requirement associ- 
ated with the cued-recall test, thereby providing a more 
sensitive estimate of the number of associates actually 
retained by the subject. 

Finally, two different paired-associate lists were used. 
One list was identical to that used in the first experi- 
ment; the second was a new list of 20 pairs of common 
nouns. This last provision was made in order to in- 
crease the generality of the elaborated-presentation 
results. 

Design and subjects. The basic design of the ex- 
periment consisted of a 2 X 2 factorial with pictorial 
presentation (standard vs. elaborated) and labeling 
mode (English vs. Spanish). Subjects participated in 
two experimental sessions: (a) acquisition and (b) 
retention. The basic design was augmented at reten- 
tion to include the within-subjects factor of trials (three 
cued-recall and a fourth, multichoice test). 

The subjects were 60 second-grade children. Their 
modal age was 7 years. The children attended school 
and resided in a low-SES Mexican-American commu- 
nity in Los Angeles, California. Census tract data 
revealed that the median educational level of this 
community was 7.9 years and the median income was 
$5,467. All 60 children had Spanish surnames. 
Forty-four were bilingual, 8 spoke only Spanish, and 8 
spoke only English. Language competency was de- 
termined by: (a) a teacher’s evaluation, (b) subject 
self-report, and (c) the child’s ability or inability to 
converse with the experimenter in English and Spanish. 
Equal numbers of children from each of the forenamed 
classifications were represented in each of the four ex- 
perimental conditions. It will be recalled that two 
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. different paired-associate lists were used. Because of 

a limitation in the number of subjects available, it was 

not possible to achieve an exact balancing of subjects 

on the lists in each condition. However, in three of the 

conditions the difference was one subject, whereas in the 

remaining condition the difference was three 
subjects. 


Results 


Similar to Experiment 1, the results are 
reported in two parts: (a) acquisition and 
(b) retention. The Type 1 error rate was set 
at .05, except where otherwise indicated. 
'Two complete analyses of the data were 
conducted, one for the subsample of 44 bi- 
lingual children and the other for the com- 
plete sample, which included children who 
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presentation effect, F(1, 56) — 35.34, indi- 
cating that elaborated presentation was as- 
sociated with a reduction in the number of 
trials required to reach criterion. Assess- 
ment of the number of correct responses 
given on the criterion trial indicates whether 
or not the degree of original learning was 
successfully equated prior to the retention 
interval. Analysis of variance failed to de- 
tect a reliable difference between the stan- 
dard- and elaborated-presentation condi- 
tions, F(1, 56) = 1.86, p > .10. 

Retention. Loss scores were computed 
for each subject based on the number correct 
on the criterion trial minus the number 
correct on each retention-test trial. These 
results are also presented in Table 5. An 
analysis of variance with repeated measures 


spoke only English and only Spanish. Pat- was conducted on the loss scores from the 
terns of performance in both acquisition and three cued-recall test trials. This analysis 
retention were identicalfor both samplesas indicated that pictorial presentation was not 
a function of pictorial presentation and la- associated with a significant effect (F < 1). 
beling mode. Thus, only the results for the This outcome, consistent with the findings 
complete sample will be presented. The from Experiment 1, indicates that elabo- 


analysis also indicated that the factor of la- 

beling mode and its interaction with the 

other manipulated factors was not associated 

with significant effects on the primary de- 
s» pendent variables of (a) trials to criterion (all 
Fs < 1), (b) number correct on the criterion 
trial (all Fs < 1), and (c) number of items lost 
in cued recall (all F's < 1, with one exception: 
F(1, 56) = 3.86, p > .05) and multichoice (all 
Fs <1). Accordingly, no distinction will be 
made as to labeling mode in the results pre- 
sented for these variables. 

Acquisition. Table 5 presents the results 
of the acquisition phase of the experiment. 
Rate of learning was indexed by the depen- 
dent variable of trials to criterion. Analysis 
of variance revealed a significant pictorial- 


Table 5 


rated presentation does not have an effect on 
retention when degree of learning is con- 
trolled. The only source of significant effect 
observed in this analysis of loss scores was a 
trials main effect, F(2, 112) = 28.86, indi- 
cating that the number of items lost declined 
over the three cued-recall test trials. 

Both extralist and within-list intrusions 
were tabulated for the three cued-recall test 
trials. The rate of extralist intrusions was 
very low (M = .30). Analysis of variance 
failed to reveal any significant source of 
variance. The rate of within-list intrusions 
was 1.92 of an item. Analysis of variance 
with repeated measures revealed only one 
reliable effect: Labeling Mode X Trials, 
F(2, 112) = 3.46. The form of the interac- 


Mean Acquisition and Retention Performance in Experiment 2 


Retention: Loss scores® 


Pictorial Trials to Correct at : € - 
presentation criterion? criterionb Trial 1 Trial 2 Trial 3 Trial 4 
Standard 6.23 16.70 5.33 4.50 4.27 1.77 
Elaborated 2.83 17.07 5.50 4.83 4.60 ETU 
M 4.53 16.89 5.41 4.67 4.44 1.47 


a MS,(56) = 4.90. 

b MS,(56) = 1.09. : sni T 

© Cued recall (Trials 1-3): MS,(56) = 25.45; MS, (112) = .53; cued recall (Trial 1): MS, (56) = 8.58; multichoice (Trial 4): MS,(56) 
sF 8.31. 
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tion indicates that the number of intrusions 
declined over trials for English labeling, 
whereas the number of intrusions increased 
over trials for Spanish labeling. ‘This find- 
ing, however, has no obvious implication for 
the elaborated-presentation effects observed 
in this study. Thus, it will not be treated in 
greater detail. 

It will be recalled that a multichoice test 
was administered after the third cued-recall 
test trial. The results indicate that the 
mean number of items lost dramatically 
declined between the third cued-recall test 
(M = 4.44) and the multichoice test (M = 
1.47), This finding is consistent with the 
notion that the multichoice-test procedure 
offers a more sensitive index of the number 
of associates actually retained by removing 
the response availability requirement in- 
herent in the cued-recall procedure. Anal- 
ysis of variance performed on the loss scores 
for this test trial failed to reveal a significant 
effect for pictorial presentation (F < 1). 
"Thus, the results of both the cued-recall and 
multichoice tests indicate that elaborated 
presentation does not affect the long-term 
memory of low-SES Mexican-American 
o when degree of learning is equat- 


General Discussion 


Elaborated amem Re held to im- 
prove noun- memory by prompting a 
representational encoding of the ei Sar 
bers within a common referential event. 
The present study offered an evaluation of 
the effectiveness of this coding strategy at 
two different stages in memory: The initial 
acquisition of the noun pairs into memory 
and the subsequent retention of the pairs. 


The results of Ex 1, conducted with 
a sample of middle-SES white children, in- 
dicated that although elaborated verbal and 


elaborated pictorial presentations facilitate 
the initial acquisition of noun pairs, they 
neither help nor hinder their long-term ret- 
ention. The results of Experiment 2 serve 
to extend the finding that elaborated pre- 
sentation is not associated with a direct 
retention effect to a population of low-SES 
Mexican-American children. 

A number of variables have been identi- 
fied in the learning and memory literature 
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that have a substantial facilitating effect on 
the initial acquisition of items but little or no 
effect on their subsequent retention when 
the degree of learning is equated. The list 
of variables includes meaningfulness (Un- 
derwood & Richardson, 1956), printed verbal 
(i.e., verb-connective) elaboration (Olton, 
1969), learning ability (Shuell & Keppel, 
1970), subject’s age (Hasher & Thomas, 
1973), imagery instruction (Hasher, Rieb- 
man, & Wren, 1976), presentation mode 
(Postman, 1978), and orienting task (Nelson 
& Vining, 1978). The outcome of the 
present investigation indicates that the 
forms of elaborated verbal and elaborated 
pictorial presentations assessed in this study 
(i.e., preposition connectives and conjoined 
referents, respectively) should be added to 
the list. 

Although the specific forms of elaborated 
presentation assessed in the present study 
did not have a direct retention effect, caution 
should be exercised in concluding that all 
types of elaborated presentation will operate 
in the same way. Theoretical accounts of 
elaborative coding suggest that the quality 
of the referential event encoded for the pair 
members may affect their accessibility (e.g., 
Rohwer, 1973). Research concerning the 
influence of variation in event quality on 
long-term memory is quite limited but sug- 
gestive. For example, Wolff, Levin and 
Longobardi (1974) report that the 24-hour 
retention of object pairs was enhanced in 
nursery-school-age children by handling. 
That is, subjects who performed an inter- 
action with the toy objects retained more 
than subjects who only observed the inter- 
action. Also, Hasher, Griffin, and Johnson 
(1977) report that with adults, stimulus- 
related elaborators produce superior noun- 
pair retention relative to response-related 
elaborators after a 1-week interval. Al-. 
though the results of both of these studies 
are consistent with the notion that variation 
in event quality will affect long-term reten- 
tion, neither study included a baseline con- 
dition. That is, Wolff et al. did not include 
a condition in which the performers only 
held the TBR toy objects side by side, and 
Hasher et al. did not include a condition in 
which the members of the TBR pair were 
presented alone. Thus, it cannot be deter- 
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mined if the effects observed in these studies 
reflect facilitation or impairment of long- 
term memory relative to the level of perfor- 
mance that would be associated with a 
standard presentation control. 

In summary, the results of the present 
study demonstrate that although elaborated 
presentation facilitates the initial acquisition 


*""* of noun pairs into memory, it does not have 
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an effect on the subsequent retention of the 
pairs. A different way to view the present 
outcome is in terms of cost effectiveness. 
That is, to sustain the same level of noun- 
pair retention after 7 days, elaborated pre- 
sentation requires only half as many 
study-test cycles during acquisition relative 
to standard presentation, This latter view 
serves to underscore the role of elaboration 
as a facilitator in childhood even in the ab- 
sence of a direct retention effect. 
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Effects of Children’s Family Structure Status on the 


Development of Stereotypes by Teachers 


John W. Santrock and Russel L. Tracy 
University of Texas at Dallas 


The possibility that teacher ratings of children may indicate a stereotype on 
the part of teachers was investigated by showing 30 teachers a videotape that 
focused on the social interaction of an 8-year-old boy. Half of the teachers 
were given a background information sheet that indicated he was from a di- 
vorced home, while the other half were informed that he was from an intact 
home. The teachers were asked to view the videotape and then to rate the boy 
on a wide range of 11 personality traits (e.g., anxiety, social deviance, and hap- 
piness) and to predict what his behavior would be like in five different school 
situations (e.g., copes with stress and popularity). The Bonferroni multiple- 
comparisons procedure indicated that the teachers rated the divorced child 
more negatively on the following three variables reflecting affective state or 
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relations: happiness, emotional adjustment, and copes with stress. 


Adverse effects of father absence on so- 
cial and cognitive development have been 
reported in several reviews (e.g., Biller, 1970; 
Lynn, 1974; Lamb, 1976). Nonetheless, a 
number of methodological problems has 
clouded the interpretation of many of these 
studies (Herzog & Sudia, 1973; Pedersen, 
1976). Information about children in 
studies of father absence has been obtained 
using a variety of sources and methods of 
data collection. Mothers have been inter- 
viewed, teachers have been asked to provide 
trait ratings of children, a host of structured 
and nonstructured personality tests have 
been given, clinicians have been asked to rate 
children on personality characteristics, and 
sometimes behavioral observations have 
been made. 

The use of ratings of children by judges 
who have varying degrees of knowledge 
about them is pervasive in the area of fa- 
ther-absence research (and in the entire field 
of parent-child relations as well). And, 
unfortunately, many researchers also still 
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choose to use a single measure to evaluate 
the child's personality orintelligence. The ^ 
interpretation of such ratings, though, is not 
always straightforward. This is because 
rating scale assessments usually agree only 
moderately with more behavioral assess- 
ments, and occasionally, the two methods ., 
yield incompatible results. 

The validity of rating scale assessment 
techniques has been seriously challenged on 
the grounds that ratings may largely reflect 
the “implicit” personality theory of the in- 
dividual rater rather than characteristics of 
the individual being rated (Mischel, 1968, 
1973). 

A study by Santrock (1975) illustrates the 
discrepancy in results often found between 
trait ratings and more behaviorally based 
measures. He examined the effects of father 
absence on moral development. Moral be- 
havior was assessed with resistance-to- 
temptation, self-criticism, altruism, repa- 
ration, and teacher-rating measures; while 
moral judgment was evaluated with three 
Kohlberg items; and moral affect was in- 
vestigated with two story completion items 
maximizing guilt. When relevant variables 
(IQ, socioeconomic status, age, and sibling 
status) were controlled, few differences were 
found between father-absent and father- 
present boys. However, father-absent boys 
were reported by their teachers as less ad- 
vanced in moral development than father” 
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present boys. Further, the sons of divorced 
women showed more “social deviation,” ac- 
cording to their teachers. 

The interpretation of the discrepancy 
between Santrock’s (1975) findings based on 
rating versus those obtained from direct 
observation is not clear. On the one hand, 
the teacher ratings may accurately reflect the 
child’s behavior over long periods of time 
and, as such, be less influenced by situational 
variability than on-the-spot behavioral ob- 
servations of the child. On the other hand, 
teacher ratings may reflect the implicit 
theory of the rater and his or her misper- 
ception of the child’s behavior. 

The intent of the present investigation 
was to explore the possibility that teacher 
ratings of children may indicate a stereotype 
on the part of the teacher that entails nega- 
tive expectations for children from divorced 
families and positive expectations for chil- 
dren from father-present families. 


Method 


Subjects and Experimenters 


The subjects were 30 undergraduate and graduate 
students attending the University of Texas at Dallas. 
All were either graduate students who were teachers or 
undergraduate students who were completing or had 
completed their student teaching requirements. The 
mean age of the subjects was 33.8 years. The experi- 
menters were two graduate student males, 


Materials and Procedure 


The subjects were randomly assigned to one of two 
conditions: an intact-family condition (mean age of 
subjects = 32.0 years) or a divorced-family condition 
(mean age of subjects = 35.6 years). Subjects in each 
group were shown, individually, the same 20-minute 
videotape of an 8-year-old boy named David. The 
videotape depicted David at home and on the play- 
ground during peer interaction. Considerable time was 
spent with David and his friends, talking with them and 
allowing them to get used to the videotape camera be- 
fore the actual taping. 

The following written information was given to the 
teachers in both conditions: 


This study deals with the effects of divorce on the 
social and emotional development of children. De- 
spite the fact that divorce is by now commonplace 
in our society, very little is known about its actual 
impact on the child. And, the previous work in this 
area has examined this issue using highly artificial, 
laboratory-type assessments. It is an open ques- 
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tion as to whether such assessments accurately re- 
flect features of the child’s normal, everyday behav- 
ior in his natural environment. 


The present study is designed to examine the ef- 
fects of divorce by looking at the normal, everyday 
behavior of children under naturalistic conditions. 
We think that this method has considerable prom- 
ise of resolving much of the present controversy in 
the literature. At this point in our research pro- 
gram, we want to see whether raters, working inde- 
pendently of each other, can agree reasonably well 
on the ratings of a child’s personality based on 
seeing a short videotape of the child’s behavior. 
We have obtained videotapes of children interact- 
ing with other children in a play-group situation. 
Half of our tapes focus on children from divorced 
homes; the other half picture children from intact 
(nondivorced) homes. What we want you to do 
today is to view a tape of one child and rate this 
child on several personality traits. 


Subjects in the intact-home condition then read the 
following: 


The child you are about to see is from an intact 
home and is named David. He is 8 years old and is 
in the third grade. He is living with his mother, fa- 
ther, and brother in Dallas, Texas. 


Subjects in the divorced-group condition then read 
the following: 


The child you are about to see is from a divorced 
home and is named David. He is 8 years old and in 
the third grade. His father and mother were di- 
vorced about 4 years ago, and he now lives with his 
mother and brother in Dallas, Texas. The extent of 
David’s contact with his father is not known to us. 


Having read the instructions, the subjects were 
asked if they had any questions. If not, the experi- 
menter started the tape and pointed out who David was. 
The subjects were told to attend carefully to him so that 
they could subsequently rate his personality. 

After the subjects had viewed the tape, the experi- 
menter handed the subject two sheets of paper. One 
was called the “Personality Trait Rating Scale; the other 
was labeled “Predicted Behavior in School.” The 
Personality Trait Rating Scale was completed first. 
The subject was asked to rate David on a 1-9 scale (1 = 
very low; 9 = very high) for the following personality 
traits: happiness, gets along with others, need for 
achievement, introversion, emotional adjustment, 
morality, anxiety, aggression, deviance, and sex role 
adjustment. On the Predicted Behavior in School 
sheet, the subject was asked to now think about how 
David might be expected to act in school. David was 
rated on a 1-9 scale for the following items: “How 
much is he likely to break school rules and regulations?” 
(very unlikely to very likely); “How well do you think 
he would cope with stressful situations at school?” (cope 
very poorly to cope very well); “To what extent do you 
feel he would be cooperative with teachers at school?” 
(very cooperative to very uncooperative); “How popular 
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do you think he would be with his classmates at school?” 
(very unpopular to very popular); “Is he likely to assume 
leadership positions at school?” (very unlikely to very 
likely); and, “To what extent is he the type of child you 
could leave the classroom and know that he still would 
do his work and follow your directions of what the class 
should do while you are gone?” (very unlikely to very 
likely). 

Careful attention was given to scheduling videotape 
viewing times in order to minimize the opportunity for 
the subjects to discuss the experiment with each other. 
Atthe end of the session, the subjects were asked not to 
discuss the experiment with their classmates. They 
were then debriefed and informed about the actual 
format of the experiment; namely, all subjects viewed 
and assessed the same child. As part of the debriefing, 
the subject was asked if he or she knew about or could 
guess what we were trying to find out in the present 
study. None of the subjects detected that all were 
seeing the same videotape and that the experiment ac- 
tually concerned stereotypes. 


Results and Discussion 


The means and standard deviations for 
each of the 16 items the teachers were asked 


Table 1 
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to rate are shown in Table 1. The teacher 
ratings were subjected to two different 
analyses. First, the liberal strategy of 
applying a t test separately to each of the 16 
scales was followed. Nine of the 16 tests 
were significant at p < .05, and 5 of those 
were significant at p € .01. Each statisti- 
cally significant comparison involved ratings 
that were less favorable in the divorced- 
home condition as compared with the in- 
tact-home condition. 'The second data 
analysis strategy involved a more conserva- 
tive approach to evaluating group differ- 
ences on the 16 personality trait items, 
namely the Bonferroni multiple-compari- 
sons procedure (Morrison, 1976). This ap- 
proach provides an overall (or experiment- 
wide) alpha level. When the Bonferroni 
procedure was applied to the 16 comparisons 
in the present study, 3 were significant at p 
X .05: happiness, emotional adjustment, 
and copes with stress. Each of these 3 sig- 


Means, Standard Deviations, and Level of Significance of Teacher Ratings of a Child from a 


Divorced and an Intact Home 


Level of 
Intact- Divorced- Experiment- 
, home group . . homegroup Individual wide 
Personality trait M c M c comparison? comparison” 
y Personality Trait Rating Scale 
Happiness TAT 1.19 5.20 1.26 .0001 01 
Gets along with others 6.47 1.55 4.60 1.68 004 13 
Need for achievement 6.80 1.57 6.73 2,02 ns ns 
Introversion 2.47 1.41 4.00 2.24 .04 ns 
Emotional adjustment 6.47 1.30 4.33 1.45 .0002 01 
Morality 5.13 1.22 4.73 1.49 055 ns 
Anxiety 4.40 1.96 6.13 2.00 02 ns 
Aggression 6.93 122 7.21 1.28 Do ns 
Deviance 3.93 243 5.87 119 02 ns 
Sex role adjustment 6.33 1.59 6.00 1.13 ns ns 
Predicted Behavior in School Scale 
Break rules 5.87 1.85 6.47 1.92 ns 
Copes with stress 5.67 1.80 3.53 1.51 002 05 
Cooperative 5.27 1.75 5.67 1:29 ns ; 
Popularity 5.73 1.67 4.33 1.29 02 ce 
Leadership 6.20 197 4.00 1.93 ‘00 15 
Works wthout direction 3.67 2.16 2.53 99 end s 
Note. N = 30 (15 per group). i T 


^t test. 
Bonferroni multiple-comparisons test. 


: 
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nificant comparisons yielded less favorable 


' ratings of the divorced child. Interestingly 


and serendipitously, these three variables 
logically form a cluster that reflect the af- 
fective state or relations of the child. 

Before going further, it is important to 
recognize that in the present study, the 
teachers were given a specific set about the 
nature of the study—that it dealt with the 
effects of divorce. This strategy was delib- 
erate, that is, in this initial exploratory in- 
vestigation of divorce stereotypy, effort was 
expended to make the teachers conscious of 
the fact that the study involved children 
from divorced homes as well as children from 
intact homes. In future studies focused on 
divorce stereotypy, it may be wise to include 
a control group in which teachers or other 
social agents simply are asked to rate the 
child(ren) on a set of personality traits with 
no family background information available 
to the rater. If, however, a powerful divorce 
set were operating in the present study, the 
significant differences would have been more 
numerous and would not have fallen into a 
psychologically meaningful cluster. 

There does seem to be stress, conflict, and 
problems involving family dynamics that 
predispose children from father-absent 
homes to be less competent in social and 
cognitive development than children from 
father-present homes (e.g., Lynn, 1974). 
Not all of the burden, however, stems from 
the family per se. The present results 
suggest that the child from a father-absent 
home is likely to be perceived more nega- 
tively by his teachers than a similar child 
from an intact family. 

There was no attempt to evaluate any 
particular type of teacher in the present 
study. Undoubtedly, some teachers are very 
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careful to avoid negative evaluations of 
children from so-called “bad” families. 
Indeed, teachers occasionally go in the op- 
posite direction and bend over backward to 
evaluate children from such families as fairly 
as possible and assist them in whatever way 
they can. Future investigations should at- 

tempt to identify those characteristics of 
teachers that render them more or less sus- 

ceptible to stereotypes about children from 

divorced homes. Such research should also 

include boys and girls and male and female 

teachers. For example, would the stereo- 

type operate as strongly with male teachers 

and male children? 
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Curvilinear Relationships Among Anticipated Success, 
Cheating Behavior, Temptation to Cheat, and 
Perceived Instrumentality of Cheating 


John P. Houston 
University of California, Los Angeles 


In Experiment 1, forty-five undergraduate subjects were informed that they 
could earn a $10 bonus by performing above average on a free-recall task. Fol- 
lowing a pretest, they were given either a high-, a medium-, or a low-success 
message concerning their recall performance. During a subsequent test, half 
the words were left “carelessly” exposed in such a way that they could be cop- 
ied. Cheating (copying) was related to anticipated success in a curvilinear 
fashion with medium success yielding the most cheating. Experiment 2 indi- 
cated that the success-failure messages did affect expectations concerning 
success, that success was not related to repugnance of hypothetical failure in 
a curvilinear fashion, and that success was related to both temptation to cheat 
and perceived instrumentality of cheating in a curvilinear manner. 


Vitro and Schoer (1972), using grade- 
school students, observed more cheating 
following failure on a pretest than following 
success on that pretest. Similarly, Millham 
(1974) found college students cheated more 
following failure on a simulated intelligence 
test than following success. On the other 
hand, Houston and Ziff (1976) reported that 
subjects given success messages concerning 
their performance on the first of two free- 
recall trials cheated more during the second 
trial than did subjects given failure messages 
concerning their performance on the initial 
trial. Similarly, Jacobson, Berger, and 
Millham (1970) found that when faced with 
potential failure, subjects initially antici- 
pating success cheated more than subjects 
initially anticipating failure. 

At least in part, these conflicting results 
could be due to the fact that cheating be- 
havior may be related to anticipated success 
in a curvilinear, rather than a simple linear, 
fashion. Subjects anticipating near-certain 
success may perceive cheating as unneces- 
sary, and thus refrain from the behavior. 
Subjects uncertain about their chances of 
success may engage in relatively heavy 
cheating in an effort to ensure victory. 


Requests for reprints should be sent to John P. 
Houston, Department of Psychology, University of 
California, Los Angeles, California 90024. 


If such a curvilinear relationship does 
exist, then some of the apparently conflicting 
data are explainable. For example, if one 
chose two points toward the success end of 
a failure-success dimension, then a negative 
relationship between cheating and success 
might appear. But if one chose two points 
toward the failure end of the dimension, then 
a positive relationship between success and 
cheating might materialize. 

Experiment 1 was designed to explore the 
possibility of a curvilinear relationship be- 
tween anticipated success and cheating be- 
havior. The methods used paralled closely 
those employed by Houston and Ziff (1976). 
Paid volunteers were informed that they 
would win a $10 bonus if they performed 
above average on an upcoming recall task. 
All subjects were administered two succes- 
sive free-recall study-test trials. After the 
first test, one third of the subjects were in- 


formed that they had done very poorly, and : 


that they had very little chance of winning 
the bonus money. One third of the subjects 
were told that they had done well, and that 
they had a fair chance of winning. Members 
of the final third were informed that they 
had done much better than anyone tested so 
far and that their victory was almost certain. 
Following the second study period, all 
subjects attempted recall under conditions 
in which half of the word list had been left 
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“carelessly” exposed by the experimenter in 
such a way that the subjects could copy 
words if they so wished. Cheating was in- 
dexed by a difference between the number 
of recorded exposed and unexposed words. 
Experiment 1 involved a 3 X 2 design (low 
success-exposed, low success-unexposed, 
medium success-exposed, medium suc- 
cess-unexposed, high success-exposed, high 
success-unexposed). Experiment 2, a 
questionnaire study, was designed to explore 
two potential explanations of the curvilinear 
effect obtained in Experiment 1. 


Experiment 1 


Method 


Subjects. Forty-five paid volunteer college students 
served as subjects. Subjects were assigned to 15 blocks 
of three randomly arranged success conditions as the 
subjects appeared in the laboratory. 

Materials. The words used were the same as those 
employed by Houston and Ziff (1976). The 90-item 
free-recall list was composed of nouns drawn from the 
Thorndike and Lorge (1944) lists. All words had fre- 
quency ratings of A or less. The entire list was ran- 
domized 15 times with each random order being pre- 
sented to one subject in each of the three success con- 
ditions. Each random order was divided in half, and 
each half was typed on a separate sheet of paper. The 
words were double spaced, capitalized, and typed in 
three columns of 15. Thus, during the study phases, 
each subject was exposed to the 90 words in the form of 
two sheets of paper, placed side by side, each containing 
45 words. Placement of the halves to the left and the 
right was reversed between the first and second study 
phase. During the second test phase the halves of each 
random order were left “carelessly” exposed with ap- 
proximately equal frequency. All three success groups 
wee treated identically with respect to these materi- 

s. 

Procedure. Subjects were tested individually. 
Each subject faced the experimenter across a 2-m table 
in an otherwise empty room and faced the door from a 


ser distance of approximately 3m. All subjects were told 


v 


that they would be shown a list of words, that they - 


would study the words for 2 min, and that they would 
then attempt to recall the words. Before being shown 
the list, the subjects were told that they were competing 
against the other subjects, and that if they performed 
above average they would receive a bonus payment of 
$10. Paid volunteers, presumably interested in mon- 
etary gain, were employed to ensure that the bonus 
money would be a powerful incentive. 

The two recall sheets, each containing 45 words, were 
placed in front of the subject. During the 2-min study 
phase the experimenter turned away from the subject 
gm ond remained quiet. At the end of the study interval 
^5 the sheets were removed and the subject was provided 
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pencil and paper with instructions to write down as 
many of the words as could be recalled, in any order. 
The subject was told that he or she had 3 min for recall, 
and that a second study-test trial would follow. The 
experimenter left the room for 3 min, leaving the door 
ajar. No words were left exposed during this first 
test. 

Upon returning, the experimenter picked up the 
subject’s recall sheets, looked at them, and delivered the 
appropriate low-, medium-, or high-success message. 
The subject was told that the first test had been a trial 
run and that his or her performance on the upcoming 
trial would determine whether he or she would win 
bonus money. During the second study-test sequence 
the subject again had 2 min to study the word list, and 
3 min for recall. In this sequence, when the experi- 
menter collected the word sheets at the end of the study 
period, he “carelessly” left the appropriate sheet face 
up on the table next to the rest of his materials, some 2 
m from the subject. Thus, half the word list was left 
exposed during the 3-min interval when the experi- 
menter was gone from the room. Although upside down 
from the subject's position, the words were easily 
readable. Following the experiment the rights of the 
subject were protected as described in Houston and Ziff 
(1976). 


Results and Discussion 


Means and standard deviations of the 
number of words recalled correctly by the 
subjects in the three success groups on the 
first, or noncheating, trial are shown in Table 
1. Arepeated measures analysis of variance 
indicated that these means did not differ 
from one another (Fs < 1). This implies 
that the three success groups were equal in 
terms of free-recall learning ability and that 
the exposed and unexposed words did not 
differ in terms of difficulty. 

Table 1 also contains means and standard 
deviations obtained during the second, or 
cheating, trial. The success main effect was 
not significant, F(2,42) <1. The exposed- 
unexposed main effect was significant, F(1, 
42) = 55.76, p < .01, and the interaction was 
significant, F(2, 42) = 29.16, p < .01. Sub- 
sequent comparisons indicated that low- 
exposed scores did not differ from low- 
unexposed scores, F(1, 42) < 1. Medium- 
exposed scores exceeded medium-unex- 
posed scores, F (1, 42) = 108.74, p < .01. In 
the high-success condition, high-exposed 
values exceeded high-unexposed values, F (1, 
42) = 5.30, p < .05.' Newman-Keuls com- 
parisons among the three unexposed means 
indicated that the medium-unexposed mean 
was significantly smaller than both the 
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Means and Standard Deviations of Number of Exposed and Unexposed Words Recorded 


Correctly in Three Success Conditions 


First test. 
To-be- To-be- Second test 

Success exposed unexposed Exposed Unex posed 

level half half half half 
R^ 6.33 7.00 13.00 10.73 

SD 2.71 3.01 4.08 3.46 
Medium or 

M 6.85 6.43 18.00 7. 73 

SD 2.65 2.87 4.79 3.36 
Low : 

M 7.12 6.55 11.80 11.60 

SD. 2.99 2.43 343 3.50 


low-unexposed and high-unexposed means 
(p < .05). Low-unexposed and high- 
unexposed values did not differ significantly. 
Newman-Keuls comparisons among the 
three exposed means indicated that me- 
dium-exposed values exceeded both the 
low-exposed and high-exposed values (p « 
01). High-exposed and low-exposed values 
did not differ significantly. 

"These results may be taken as support for 
the hypothesis that cheating is related to 
level of anticipated success in a curvilinear 
manner. Maximum cheating appears to 
have occurred in the medium-success con- 
dition. No cheating was observed in the 
low-success condition. Significant cheating 
occurred in the high-success condition (as 
indexed by the significant difference be- 
tween the high-exposed and high-unex- 
posed values). But the amount of cheating 
that occurred in the high-success condition 
appears to have been less than that which 
occurred in the medium-success condition, 
as indexed by the fact that medium-exposed 
values exceeded high-exposed values and 
that high-unexposed values exceeded me- 
dium-unexposed values. 

It is possible that the obtained results 
might be due to massive amounts of cheating 
accomplished by a few individuals, or mod- 
‘erate cheating by a relatively large number 
of individuals. The numbers of subjects 
recording more exposed than unexposed 
words during the second test were counted. 
In the low-success condition, 7 subjects re- 


corded more exposed than unexposed words, 
6 subjects recorded more unexposed than' 
exposed items, and 2 recorded the same 

number of exposed and unexposed items. In 

the high-success condition the corresponding 

values were 10, 5, and 0. In the medium- 

success condition all 15 subjects recorded 

more exposed than unexposed items. These 

values suggest that the results were deter- 

mined by a large number of subjects cheat- 

ing, rather than heavy cheating by a few in- 

dividuals. 


Experiment 2 


Experiment 2 was designed to answer 
three questions. First, was the manipula- 
tion of success messages in Experiment 1 
effective in producing varied expectations 
concerning second-trial performance? 
Second, is the curvilinear relationship re- 
lated to the fact that failure following initial 
success may be perceived as more repugnant, 
or negative, than failure following initial 
failure (Feather, 1966, 1967, 1969; Feather 
& Simon, 1971)? Houston and Ziff (1976) 
presented evidence which suggested that 
subjects in a condition comparable to the 
present medium-success condition cheated 
more and rated hypothetical failure as more 
unpleasant than did subjects in a low-success 
condition. Their notion was that subjects 
anticipating success may cheat more to avoid 
a particularly unpleasant failure. But the | 
question remains as to how the high-success“ 
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subjects in the present design would rate 
hypothetical failure. Is it possible that upon 
receiving such an extremely potent success 
message, subjects may perceive subsequent 
hypothetical failure as /ess unpleasant than 
subjects receiving an intermediate-success 
message? The need to avoid failure may not 
be quite so great following an extremely in- 
tense initial success experience. If this is 
true, then the curvilinear relationship be- 
tween cheating and success may be paral- 
leled by a curvilinear relationship between 
success and repugnance of failure. 

The third question addressed by Experi- 
ment 2 has to do with an alternative expla- 
nation of the curvilinear relationship ob- 
tained in Experiment 1. Specifically, 
subjects receiving low-success messages may 
perceive failure as inevitable and cheating as 
relatively futile. Subjects receiving high- 
success messages may perceive success as 
inevitable, and cheating as unnecessary for 
victory. In other words, medium-success 
subjects may perceive cheating as more in- 
strumental than would either high- or low- 
success subjects. 


Method 


In Experiment 2, subjects were treated identically to 
those in Experiment 1, up to the point when they had 
just received their second study trial. At this point all 
subjects completed a four-item questionnaire designed 
to pursue the hypothesis outlined above. The experi- 
menter again left the room and left the appropriate 
words “carelessly” exposed even though recall was not 
attempted. This was done to recreate as closely as 
possible the conditions in which the subjects were ex- 
pecting to attempt recall when cheating was possible. 
Subjects were 45 paid volunteers, randomly assigned to 
low-, medium-, and high-success conditions. The word 
lists and methods of presentation were identical to those 
in Experiment 1. 

The questionnaire was composed of four scales 
mimeographed on two stapled sheets of paper. Before 
leaving the room the experimenter told the subject to 
complete the two items on the first page, and then to go 
on to the second page. The first item asked the subject 
to estimate how likely it was that he or she would win 
$10 by attempting to recall the words at that time. This 
scale ran from “extremely unlikely” to “extremely 
likely." All scales were composed of a 5-in (12.7 cm) 
horizontal line containing small vertical marks every % 
in (1.27 cm). The second questionnaire item asked the 
subject to suppose that he or she had failed to win $10, 
and to indicate how unpleasant, or repugnant, such a 
failure would be by placing an X on the scale running 
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from “not unpleasant at all” to “extremely unpleasant.” 
The third item pointed out to the subject the word list 
that had been left “carelessly” exposed by the experi- 
menter. The item asked the subject to indicate, by 
placing an X on a scale running from “not tempted at 
all" to *extremely tempted," how tempted he or she 
would be to copy words from the exposed word list if this 
were an actual recall test. The final item asked the 
subject to indicate the extent to which copying words 
from the exposed list would improve his or her chances 
of winning $10. This scale ran from “would not im- 
prove chances at all" to *would improve chances a great 
deal." 


Results and Discussion 


Scores on all four scales were determined 
by recording the ordinal positions of the 
vertical marks nearest the subjects' X marks 
on the scales, from 0 to 10. Table 2 contains 
the means and standard deviations of these 
Scores. 

An analysis of variance indicated signifi- 
cant differences with respect to the likeli- 
hood-of-success item, F(2, 42) = 26.37, p € 
.01. Subsequent Newman-Keuls compari- 
sons indicated that all three means differed 
from one another (p < .01). In other words, 
perceived likelihood of success increased as 
the potency of the delivered success mes- 
sages increased. This may be taken as evi- 
dence for the effectiveness of the message 


Table 2 

Means and Standard Deviations of Responses 
to Questionnaire Items in Three Success 
Conditions 


Questionnaire Success. 
item Low Medium High 
Likelihood 
of success 
M 4.60 6.67 8.73 
SD 1.74 2.44 1.32 
Repugnance 
of failure 
M 2.53 6.40 5.53 
SD 1.13 2.25 1.25 
Temptation 
to cheat 
M 4.27 6.13 2.87 
SD -98 1.42 1.58 
Improvement 
of chances 
M 3.80 5.73 3.87 
SD 1.80 2.09 1.99 
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manipulation in producing significant vari- 
ations in success anticipations. : 

With respect to the repugnance-of-failure 
item, the overall F ratio was significant, F(2, 
42) = 26.46, p <.01. However, subsequent 
Newman-Keuls comparisons indicated that 
although the high- and medium-success 
ratings of repugnance both exceeded the 
low-success ratings (p < .01), they did not 
differ from one another. Thus the data offer 
no clear support for the notion that ex- 
tremely high degrees of anticipated success 
will result in a lessening of the perceived 
repugnance of failure. 

The temptation ratings produced by the 
three success groups yielded a significant F 
ratio, F(2, 42) = 20.56, p < .01. Newman- 
Keuls comparisons indicated that all three 
means differed from one another (p « .01). 
Medium-success subjects recorded greater 
temptation than either high- or low-success 
subjects. Subjects anticipating either 
probable failure, or almost certain success, 
were less tempted to cheat than subjects 
anticipating a more uncertain success. 

The final questionnaire item, dealing with 
estimates of the usefulness of cheating, also 
produced a significant F ratio, F(2, 42) = 
3.30, p < .05. Newman-Keuls comparisons 
indicated that the low- and high-success 
means both differed from the medium-suc- 
cess mean (p < .05) but did not differ from 
one another. This suggests that the curvi- 
linear relationship between cheating and 
success may be explainable in terms of the 
subjects’ perceptions of the usefulness of 
cheating. Subjects anticipating almost 
certain failure did not cheat because they did 
not see cheating as a viable means of im- 
proving their chances of success. Subjects 
anticipating an easy victory did not perceive 
cheating as an effective means of improving 
their already very good chances of winning. 
Subjects in the medium-success condition, 
on the other hand, saw cheating as a means 
of ensuring a somewhat less than certain 
victory, and appeared to adopt a rather 
Machiavellian attitude (Christie & Geis, 
1970). 
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Conclusions 


Future efforts should pursue the possi- 
bility that the obtained curvilinear rela- 
tionship between degree of anticipated suc- 
cess and cheating might be related to a 
straight linear relationship between degree 
of certainty about outcome and cheating — 
the higher the uncertainty, the greater the 
cheating. In the present experiments, 
subjects in the high- and low-success con- 
ditions were quite certain of their outcomes 
while only the medium-success subjects were 
uncertain. 
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Sex Differences in Predicting Final Examination Grades: 


The Influence of Past Performance, Attributions, 
and Achievement Motivation 


Marcia S. Halperin and Doris L. Abrams 
University of Delaware 


Undergraduates (43 males and 41 females) in an economics course reported 
their high school and last semester grade point averages, their first and second 
midterm grades, and their final exam predictions. Students also rated the in- 
fluence that ability, effort, task difficulty, and luck had on their performances 
and completed an achievement motivation scale. Regression analyses provid- 
ed support for the attribution model of achievement expectations. All stu- 
dents used ability to explain their successes; however, men attributed failures 
to lack of effort while women often used luck to explain their performances. 
Although men and women earned equal midterm scores and predicted similar 
final grades, men based their predictions solely on the two midterms. Wom- 
en’s predictions were significantly affected by both midterm performance and 


achievement motivation. 


In recent years, numerous researchers 
(e.g., Bar-Tal & Frieze, 1975; Feather, 1969; 
Feather & Simon, 1971; Frieze, 1976; 
McMahan, 1973; Simon & Feather, 1973; 
Weiner et al., 1971) have used an attribu- 
tional model to interpret individual differ- 
ences in achievement-related beliefs and 
behaviors. Weiner et al. (1971) proposed 
that achievement motivation and expecta- 
tion of success or failure depended upon how 
individuals explained their own perfor- 
mances. Weiner et al. classified four possi- 
ble explanations (ability, effort, task diffi- 
culty, and luck) into a 2 X 2 model, with 
stability and locus of control as the two di- 
mensions. Ability was considered an in- 
ternal stable attribution, effort was consid- 
ered an internal unstable attribution, task 
difficulty was considered an external stable 
attribution, and luck was considered an ex- 
ternal unstable attribution. These four at- 
tributions were then linked to elements in 
the classic theory of achievement motivation 
(Atkinson, 1974; McClelland, Atkinson, 
Clark, & Lowell, 1953). According to 


The authors wish to thank Roberta Golinkoff for her 
valuable comments on an earlier draft and David Black 
for the use of his class. 

Requests for reprints should be sent to Marcia S. 
Halperin, Department of Educational Foundations, 
University of Delaware, Newark, Delaware 19711. 


Weiner (1974), the locus of control dimen- 
sion influences affect, with internal attri- 
butions leading to greater feelings of pride 
and shame; while the stability dimensions 
influences expectancy of success, with stable 
attributions encouraging consistent expec- 
tations and unstable attributions encour- 
aging expectations of change. Weiner pro- 
posed that individuals with high achieve- 
ment motivation attribute their successes to 
ability and their failures to unstable or ex- 
ternal factors. Individuals with low 
achievement motivation attribute their 
failures to ability and their successes to un- 
stable or external factors. 

These patterns of attributions have been 
found most consistently with male subjects 
(e.g, Ames, Ames, & Felker, 1976; Kukla, 
1972; Weiner, et al., 1971). Studies with 
females have yielded somewhat different 
results. In explaining success, women were 
more likely to use external attributions than 
were men (Feather, 1969). Furthermore, 
several researchers (Bar-Tal & Frieze, 1975; 
Deaux & Farris, 1977; Simon & Feather, 
1973) found that females used luck to ex- 
plain both success and failure more often 
than did males. Thus, it appears that many 
women do not take responsibility or credit 
for their performances. This tendency 
would seem to undermine achievement as- 
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pirations and expectancies for success. In 
fact, numerous studies have found that 
women generally have lower performance 
expectancies than do men, even when they 
have comparable abilities (for comprehen- 
sive reviews of this research, see Frieze et al., 
in press; Lenney, 1977). 

Most investigations in this area have been 
conducted in artificial settings. Only one 
study (Simon and Feather, 1973) examined 
attributions in the natural context of a col- 
lege classroom. Prior to taking an intro- 
ductory psychology exam, students were 
asked to rate their ability, their effort in 
preparation, their expectation of test diffi- 
culty, and their confidence in passing the 
exam. Then, 2 weeks later, students were 
asked to rate the degree to which ability, 
luck, preparation, and task difficulty had 
influenced their performances. Successful 
students—both males and females—at- 
tributed their performances to ability. 
However, males generally rated their ability 
higher than did females, and only females 
who were confident prior to taking the exam 
attributed their success to ability. 

Although the Simon and Feather results 
provide supporting evidence for the attri- 
bution model, the data are limited. For 
example, the study did not assess students’ 
academic histories or their performances 
during the earlier portion of the course— 
factors that might influence student confi- 
dence. If students perceived inconsistencies 
between past and present performances, the 
stability dimension of the attribution model 
should be affected. Furthermore, the study 
did not attempt to measure individual dif- 
ferences in achievement orientation. 
Therefore, the next logical step is to extend 
the research in natural settings and to con- 
sider a variety of variables that might pro- 
duce individual differences (particularly sex 
differences) in achievement expectations. 

The present study was based on the hy- 
pothesis that a student’s expectation of a 
final examination grade could be a function 
of past academic performances, current 
performance within the specific course, ex- 
planations for these performances, and an 
individual achievement-oriented disposition, 
In the case of sex differences, perhaps 
women are less confident than men because 
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their past performances are lower or less 
consistent. Perhaps women generally have 
lower achievement dispositions. Or perhaps 
males and females differ in how they evalu- 
ate and weigh similar academic behaviors. 
College students from an introductory 
economics class were asked to report their 
academic histories (high school grade point 
average and last semester grade point aver- 


age), their performances in the course (first _ 


and second midterms), and their predictions 
of the final examination grade. They also 
indicated the degree to which ability, effort, 
task difficulty, and luck affected and would 
continue to affect their performances in the 
course. In addition, students completed a 
scale that assessed their achievement-ori- 
ented dispositions. Then, bivariate and 
multiple regression analyses were used to 
determine which variables best explained 
the way males and females formed future 
achievement expectations. 


Method 
Subjects and Setting 


During the spring semester, 84 college students (43 
males and 41 females) were recruited from one intro- 
ductory economics class at the University of Delaware. 
Although the class contained several freshmen and se- 
niors, the majority of students were sophomores and 
juniors. All students who attended class on the day the 
study was conducted served as subjects. 


Materials 


Questionnaire on academic performance, A ques- 
tionnaire was developed to assess how students per- 
ceived their academic performances. The question- 
naire asked students to report their high school grade 
point averages, their college grade point averages for the 
previous semester, and their grades on the exams and 
to predict their grades on the final exam. In addition, 
the questionnaire asked students to explain why they 
performed as they did on the two economics exams and 
why they predicted their specific final exam grade. 

The measures of causal attributions were based on 
the four-component model proposed by Weiner et al. 
(1971). For each of the two midterms and the final 
examination, students rated on separate 7-point scales 
the degree to which their performances were determined 
(or would be determined) by ability (1 = low ability, 4 
= no effect, and 7 = high ability), effort (1 = did not try, 
4 = no effect, and 7 = tried very hard), task difficulty 
(1 = hard test, 4 = no effect, and 7 = easy test), and luck 
(1 = bad luck, 4 = no effect, and 7 = good luck). Thus, 
the students completed 19 attribution scales, 4 for each 
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of the 3exams. These separate scales enabled students 
to indicate how they perceived the independent effects 
of ability, effort, task difficulty, and luck. 

Measure of achievement motivation. Fifteen items 
that appeared on both the male and female versions of 
the Mehrabian (1972) Measure of Achieving Tendency 
were selected to form a measure of achievement moti- 
vation. Using ascale from +3 to —3, students indicated 
how much they agreed or disagreed with each of the 15 
statements. In general, the items on this scale mea- 
sured whether an individual preferred to tackle chal- 
lenging tasks and to assume responsibility for her or his 
performances. 


| 


R 


m. 
Procedure 


The study was conducted approximately 1 week after 
students took the second economics exam at the first. 
class after the instructor returned the graded tests. 
s, When the instructor finished his lecture for the day, he 
introduced the researchers who then described the 
project to the students. The investigators explained 
that they were interested in how individuals perceived 
their academic performances. They distributed the 
questionnaires and asked students to complete the two 
self-report measures. The researchers emphasized that 
participation in the study was voluntary and that all 
responses would be kept confidential. To insure ano- 
nymity and to increase the likelihood of honest reports, 
the researchers asked that students not put their names 
on their answer forms. All students filled out the 
questionnaires. The entire process took 10 to 15 min- 


a utes. 


Results 


Sex Differences in Performance, 
Attributions, and Achievement 
Motivation 


Independent t tests comparing males and 
females were computed on all the variables. 
Because the economics exams were scored by 
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letter, the grades were converted to an or- 
dinal scale (A = 9; A—, B+ = 8; B= 7; B-, 
C+ =6;C = 5; C—, D+ = 4; D = 3; D- = 2; 
F=1). The achievement motivation scale 
was scored according to Mehrabian’s (1972) 
criteria with high scores indicating high 
achievement motivation. 

Overall, women had better high school 
grade point averages than did men (for fe- 
males, M = 3.50; for males, M = 3.02) and 
higher grade point averages for the preceding 
semester (for females, M = 2.96; for males, 
M = 2.60), t(82) = 5.24, p < .001, and t(82) 
= 2.73, p < .01, respectively. However, 
there were no significant sex differences in 
midterm exam performances, final exam 
prediction, attributions, or achievement 
motivation. 

To determine whether the patterns of re- 
lationships among academic measures were 
similar for men and women, Pearson prod- 
uct-moment correlation matrices were cal- 
culated separately for males and females. 
Table 1 presents the resulting intercorrela- 
tions. 

In general, the performance measures 
were correlated. For both males (r = .36) 
and females (r = .48), the correlations be- 
tween grade on the first midterm and grade 
on the second midterm were significant (p 
<.01). Last semester grade point average 
also correlated significantly with grades on 
the two midterms, with probability at least 
at the .05 level. Also, high school grade 
point average was significantly correlated to 
all other measures of academic performances 
for the males (p < .01) but not for the fe- 


| Table 1 
^ Intercorrelations for Males and Females on Measures of Academic Performance 
k Measure 1 2 3 4 5 
1. High school GPA — 09 18 26* bh 
2. Last semester GPA 49** = 5p*** 46*** 53*** 
3. First exam grade date Dos E 48*** 66*** 
4. Second exam grade 51*** 37** 35** — 66*** 
5. Final exam prediction 43** 32* 44** 6art* aes 
Note. Males (n = 43) are below the diagonal; females (n = 41) are above the diagonal. Decimal points are omitted. GPA = grade 
Point average. 
*p <.05. 
**p < 01. 
3 AMD < 001. 
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Intercorrelations for Males and Females on Exam Performances and Attributions 


Males 
Exam performance 


Females 


Ability Effort Task difficulty Luck Ability Effort Task difficulty Luck 


First exam grade 5B** — 49** 36* 
Second exam grade G0* I pes 11 
Final exam prediction SINT S 20 


37*  .68** 22 09 62** 
—10 54** 38* 06 48** 
07 49** 08 15 33* 


Note. Decimal points are omitted. For males, n = 43; for females, n = 41. 


*p € 01. 
**p < 001. 


males. For the women, high school grade 
point average correlated with only the sec- 
ond midterm (p < .05). Thus, the students 
in this study appeared to be moderately 
consistent in their academic behavior. 
However, the females seemed to perceive 
some discrepancy between their college and 
high school performances. 

Pearson product-moment correlations 
were also calculated among the various at- 


"Table 4 
Correlations Between Grades and Attributions 
Computed Separately for Students Above and 


Below Performance Medians 


Performance 
median Ability Effort Task Luck 
First midterm 
Above 


Males (n = 20) 53** 28 29 


Females (n = 19)  62** 51* 20 21 
Below 

Males (n = 23) 35* 43* 29 23 

Females (n = 22) 45* -13 04 55** 


Second midterm 


Above 
Males (n = 18) Diet) E 33 38 
Females (n = 20)  65** 24 -12 18 
Below 
Males (n — 25) DIS P69***. 0] —18 
Females (n = 21) —05 33  -07 28 
Final prediction 


Above 
Males (n = 22) 39* 08 09 —02 
Females (n = 22) | 55** —02 —38* 06 
Below 
Males (n = 21) 14 —IHT 49** 23 


39* : —18 03 17 
Note. Decimal points are omitted. 
*p <.05. 
**p < 01. 
***p < (01. 


Females (n — 19) 


tribution ratings. These intercorrelations 
are presented in Table 2. Both males (m) 
and females (f) seemed consistent in ex- 
plaining their past academic performances. 
For example, the correlations between the 
attributions for the two midterms indicated 
that the two ability attributions, (rg = .38; 
rg = .39), the two effort attributions (rm = 
.38; rg = .56), the two task attributions (rm = 
.64; rg = .59), and the luck attribution for 
women (r¢ = .43) were all significant at p < 
.01. For men, the correlation between luck 
attributions approached significance (rm = 
.23, p < .10). 

Table 3 reveals some differences between 
the males’ attributions and the females’ at- 
tributions. Although both sexes indicated 
that ability strongly affected their perfor- 
mances on the first exam (rq = .58; rf = .68; 
p € .001) and on the second exam (rm = .60; 
rg = .54; p € .001), males seemed to place 
more emphasis on effort than did females. 
For the first exam, the correlation between 
grade and effort attribution was .49 for males 
(p € .001) and .22 for females (p < .10). For 
the second exam, the correlations were .59 (p 
« .001) and .38 (p < .01) for males and fe- 
males, respectively. In contrast, females 
were more likely to use luck in explaining 
their academic performances (r — .68 for the 
first exam, p < .001; r = .48 for the second 
exam, p < .001) than were men (r = .37 for 
the first exam, p < .01; r = —.10 for the sec- 
ond exam). 


Attributional Patterns for Successful and 
Unsuccessful Students 


To examine whether males and females 
differed in how they explained success and 
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Regression of Final Exam Prediction on Past Academic Performances, Attributions, and 


Achievement Motivation 


Achievement Motwation o M————————— 


Males Females Males Females 
Variable B SE B SE B B 

High school GPA —.262 444 622 457 —.095 147 
Last semester GPA .004 .331 .028 .265 .002 .013 
First midterm .226 .091 -230 .084 .948* 362* 
Second midterm .379 .108 .237 .092 .552** 347* 
Achievement motivation —.004 .024 .043 .020 —.025 .251* 
Final ability attribution 218 .180 .045 132 .216 —.040 
Final effort attribution 314 .214 .136 .189 .203 —.080 
Final task attribution -297 167 .223 110 .281 7197. 
Final luck attribution .054 231 .033 188 .029 —.020 
Constant —.213 —.362 

R A) .85 

R? 61 NE 


Note. For males, n = 43; for females, n = 41. 
*p < .05. 
**p « 001. 


failure, the correlations between attributions 
and performance were calculated separately 
for students whose grades and predictions 
fell above and below the median. These 
data are presented in Table 4. 

For successful students, the patterns for 
males and females were similar. All used 
ability attributions to explain their success. 
For students falling below the median, the 
patterns were more variable. However, 
there seemed to be sex differences that fit 
with the attribution model. Men tended to 
use lack of effort to explain low grades on the 
midterm exams. Women seemed to be less 
consistent, using low ability and bad luck to 
explain low performance on the second 
exam. In predicting a low final grade, males 
tended to blame the task (too difficult), 


while females tended to blame themselves 
(low ability). 


Sex Differences in Predicting Final 
Examination Grade 


To determine whether men and women 
used different information to predict future 
Success, multiple regression equations were 
computed separately for males and females. 
Final exam prediction was the criterion 


GPA = grade point average. 


variable. Past academic performances (high 
school grade point average, last semester 
grade point average, first midterm, and 
second midterm), attributions for the pre- 
diction, and achievement motivation score 
served as the predictor variables. If males 
and females are similar in how they form 
future expectations, the regression weights 
in the two equations should be similar. 
Table 5 presents the two resulting equa- 
tions. : 

For the males, the multiple regression 
equation accounted for 6196 of the variance, 
F(9,33) = 5.65, p < .001. Examination of 
the beta weights indicated that the second 
midterm grade made the greatest contribu- 
tion to the prediction equation (p « .001), 
while the first midterm grade made the only 
other significant contribution (p <.05). For 
the females, the multiple regression equation 
accounted for 7396 of the variance, F(9, 31) 
79.11, p <.001. First midterm grade, sec- 
ond midterm grade, and achievement moti- 
vation all contributed significantly to the 
Pea of the final exam grade (p € 

Because the predictor variables are cor- 
related with each other, the comprehensive 
regression equations do not reveal the total 
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Table 6 


769 


Variance in Final Exam Predictions Explained by Predictor Variables Individually and in 


Combination 


Predictor variable Attributions Exam performances Academic history Achievement motivation 
Males (n = 43) 
Individual effects .274* .453*** .198* .000 
+ Attributions — .600* 371 .283* 
+ Exam performance .600*** — .606*** .602*** 
+ Academic history .606 .606 — .606 
+ Achievement motivation .606 .606 .606 — 
Females (n = 41) 
Individual effects .289* .586*** .360*** :319*** 
+ Attributions — .672* 493 431 
+ Exam performances 612*** — .686** .709*** 
+ Academic history .686 .686 — 7126 
+ Achievement motivation .726* .726* -726* — 


Note. All attributions for final exam performance (ability, effort, task difficulty, and luck) are entered in the same step; both 
midterm performances are entered in the same step, and both measures of academic history (high school grade point average and 


last semester grade point average) are entered in the same step. C 


‘olumn proportions reflect the change in total variance produced 


by adding predictor variables to the regression equations. Significant increases in R? are indicated by asterisks. 


*p <.05. 
**p < O1. 
***p < 001. 


relationship between each predictor and the 
final exam expectations. Therefore, eight 
stepwise regressions (four regressions for 
each sex) were computed. For each com- 
putation, the order of entering the variables 
was varied. Exam performances (first and 
second test), attributions (ability, effort, task 
difficulty, and luck), academic history (high 
school grade point average and last semester 
grade point average), and achievement mo- 
tivation were entered in four separate steps. 
These data are presented in Table 6. The 
row labeled “individual effects" reports the 
variance explained by each predictor when 
it is the first variable entered into the re- 
gression. The subsequent rows indicate the 
increment in explained variance when ad- 
ditional variables are entered. 

Individually, exam performances ex- 
plained the most variance for both males and 
females. Attributions seemed to be equally 
important for men and women. However, 
academic history and achievement motiva- 
tion appeared to be more strongly related to 
women's final exam expectancies than to 
thoseof men. Achievement motivation ex- 
plained no variance in men's final exam 
prediction even when it was the first variable 
entered into the equation. 


The relative importance of each variable 
can be inferred by examining the increment 
in explained variance when the predictor is 
added at a later step in the regression. Re- 
gardless of when it is entered, exam perfor- 
mance significantly increases the explained 
variance of both the male and female re- 
gression equations. Attributions seem to be 
important for both men and women when 
attributions and exam performances are the 
only variables in the regression equation. 
Attributions are also important for men 
when they are added after achievement 
motivation. However adding attributions 
to the corresponding female equation does 
not significantly increase R?. In both the 
male and female equations, R? is not signif- 
icantly increased when attributions are 
added after academic history. 


Discussion 


The present study adds supporting evi- 
dence to the existing experimental research 
on academic expectations. In a natural 
setting (i.e., a college classroom), men and 
women did not differ significantly in their 
expectations of success on the final exami- 
nation. This finding fits with Lenney's 
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(1977) hypothesis that in a situation where 
feedback is clear, such as midterm test 
scores, females do not predict lower grades 
than do men. However, comparison of the 
two regression equations suggests that the 
process by which males and females reach 
the same conclusions may be different. For 
males, the best single predictor for the final 
exam grade was the grade on the second 
midterm, followed by the grade on the first 
midterm. No other predictor variable was 
significant. For the women, both midterms 
contributed significantly and equally to the 
final exam prediction. Furthermore, 
achievement motivation made a significant, 
independent contribution to females’ ex- 
pectations of the final exam grade. Women 
who liked to tackle challenging tasks and 
assume responsibility for their performances 
were more optimistic in their expectations 
than were women who did not. Thus, the 
achievement motivation score might be 
similar to the confidence variable found in 
the Simon and Feather (1973) study. 

The stepwise regressions suggest the in- 
fluence that attributions have on the for- 
mation of future expectations. For both 
men and women, exam performance and 
attributions explain almost all variation in 
final exam predictions. This pattern indi- 
cates that expectancies are based not only on 
objective performances but also on causal 
interpretations of these performances. 

The pattern of bivariate correlations 
provides at least partial support for an at- 
tribution explanation of the sex differences 
in expectations. For example, the 
women—unlike the men—seemed to see a 
discrepancy between their high school per- 
formances and their college performances, 
According to Weiner et al. (1971), inconsis- 
tency in performance tends to encourage the 
making of unstable attributions. Thus, ifa 
woman does not believe that her college 
performance matches her high school 
achievements, she might be uncertain how 
to explain the results. This study cannot 
determine whether the women’s high school 
achievements were really different from 
their college achievements. But, at the very 
least, the results Suggest that many female 
students perceived an inconsistency. 
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Analysis of the attributional patterns for 
students performing above and below the 
median offers further support for the Weiner 
et al. (1971) model. Students who were 
above the median on the first and second 
exams attributed their performances to 
ability—a stable, internal cause. Thus, it 
makes sense for these students to use the 
midterm grades to predict their exam scores. 
Because ability is stable and within the 
person, there is every reason to expect that 
it will facilitate performance on the final 
examination. Males who scored below the 
median on the two tests used low ability to 
explain their performances, but they also 
emphasized lack of effort. In giving attri- 
butions for their final exam predictions, the 
men indicated that they could score better 
(because they could try harder); but if they 
did not, the fault lay with the test (task dif- 
ficulty). In contrast, the women seemed to 
be less certain how to explain their low per- 
formances. The final set of attributions 
suggested that women who predicted low 
scores were more likely to blame themselves. 
Thus, the attributional pattern for the low- 
performing men was closer to the adaptive 
pattern suggested by Dweck (1975). She 
found that achievement-oriented behavior 
of second-grade children could be enhanced 
by training the children to attribute their 
failures to lack of effort. 

The conclusions drawn in the present 
study must be made with caution. Because 
the achievement motivation measure was a 
subset of the original Mehrabian scale, it 
may not be a comprehensive reflection of 
achievement motivation, Also, as with most 
self-report data, one encounters the diffi- 
culty of accuracy and memory distortion. 
However, since the study was conducted 
immediately after the students received their 
grades on the second midterm, they should 
have remembered their scores. In addition, 
students reported their scores anonymously, 
so they were probably honest. In fact, the 
students’ reports of high school grade point 
averages matched the pattern found in the 
University of Delaware survey of entering 
freshmen. _ Because the study was concerned 
with how individuals perceived themselves 
and their performances and then used these 
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perceptions to predict future grades, a self- 
report procedure seemed to be the best al- 
ternative. 

The results of the study suggest that 
conceptualizing expectancy as a function of 
past academic performances, current per- 
formance, explanations for these perfor- 
mances, and achievement-oriented disposi- 
tion is a useful addition to research on 

_ achievement-related behavior. Once theo- 
retical models are developed and tested in 
artificial situations, psychologists must start 
to look for corroborating evidence in natural 
settings and with a variety of methodolo- 
gies. 
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Students’ Ratings of Academic Programs: — 
A Study of Structural and Discriminant Validity 


Sharon Derry and Dale C. Brandenburg 
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Undergraduate and graduate students responded to a questionnaire designed 
to measure students’ satisfaction with their academic major programs. Hier- 
archical factor analyses produced three specific factors and one general factor 
that were invariant in undergraduate and graduate samples: student percep- 
tion of value in program, student satisfaction with instruction, student satis- 
faction with faculty mentorship, and overall satisfaction with department. 
Significant differences in mean department subscale scores provided useful 
information for summative evaluation. No conclusions were reached regard- 
ing the diagnostic value of the instrument. 


The use of students’ ratings for judging 
and improving classroom instruction has 
become an accepted tradition on many col- 
lege campuses. At the University of Illinois 
at Urbana-Champaign, student opinions also 
assist in the evaluation of a broader object of 
study, the academic department. In 1973 
this university initiated an evaluation pro- 
gram which requires that each academic 
department conduct a comprehensive self- 
study every 5 years. As part of the self- 
study plan departments survey their student 
majors, using the Program Evaluation Sur- 
vey (PES), a standardized questionnaire 
designed to measure student satisfaction 
with the academic program. 

Satisfaction scores derived from PES data 
are potentially useful in two ways: One is to 
assist university level administrators in 
making comparative judgments across many 
educational units and thus in setting ad- 
ministrative priorities related to those units; 
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a second is to assist both faculty and de- 
partment level administrators in identifying 
strengths and weaknesses within depart- 
ments, thus directing efforts for program 
improvement. These two uses reflect the 
usual distinction between summative and 
formative evaluation, but here they are 
viewed as mutually implicative aspects of an 
ongoing quasi-diagnostic program of insti- 
tutional self-assessment. 

Face validity provides one form of justi- 
fication for linking students’ satisfaction 
ratings to university decision-making pro- 
cesses. Few educators would argue that 
experience in school should not be satisfying. 
Numerous prominent educators have re- 
cently expressed views that student attitude 
is an important and measurable learning 
outcome (Cooley & Lohnes, 1976; Epstein & 
McPartland, 1976; Jencks et al., 1972; Sil- 
berman, 1970). The growing popularity of | 
diagnostic course/instructor evaluation 
Systems (Brandenburg, Derry, & Hengstler, 
(Note 1); Derry, (Note 2); Kulik, (Note 3) 
also bears testimony to the widespread ac- 
ceptance of student attitude as an important 
criterion for evaluating teaching. For these 
reasons, students’ satisfaction ratings appear ` 
to have the face validity needed to promote — 


1A copy of the PES may be obtained by contacting. 
Dale C. Brandenburg, Measurement and Research Di- 
vision, Office of Instructional Resources, University of. 
Illinois, Urbana, Illinois 61801. 
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their acceptance and use on a college cam- 
pus. 
Other justifications are based on correla- 
tional studies. Pervin (1967) demonstrated 
a consistent negative relationship between 
students’ satisfaction ratings and the lack of 
fit between students and their colleges, a 
variable also correlated with student drop- 
"out rate. Bowan and Kilmann (1975) 
showed that satisfaction was negatively 
correlated with discrepancies between stu- 
dents’ desired and perceived control over 
their learning environments. Epstein and 
McPartland (1976) demonstrated that sat- 
isfaction ratings for public school students 
were positively correlated with measures of 
"student achievement and personal adjust- 
‘ment. Thus, satisfaction appears to be re- 
lated to positive student-environment in- 
teraction. 
Despite evidence that students’ satisfac- 
- tion ratings have adequate face and corre- 
- Jational validities, the use of such ratings in 
'academic-program evaluation cannot be 
justified solely on the basis of these criteria. 
The aims of program evaluation dictate that 
ratings must serve two practical functions: 
(a) identification of particularly strong and 
weak programs (between-department dis- 
crimination) and (b) identification of strong 
and weak aspects within individual programs 
(within-department discrimination). To 
serve both functions, ratings must possess 
structural and discriminant validity. 
Structural validation of the PES was the 
object of our Analysis 1, which sought to 
describe the factor structure underlying PES 
ratings and to employ this factor structure 
to help explain the dimensionality of student 
satisfaction. Because past studies have 
generally treated student satisfaction as a 
" criterion for validating other types of mea- 
` gures, its dimensionality has not been in- 
vestigated. One exception is a study con- 
ducted by Epstein and McPartland (1976), 
which concluded that satisfaction measured 
by the Learning Climate Questionnaire was 
a multidimensional variable with three 
components: general attitude, attitude 
toward teachers, and commitment to 
School. 
The analytic procedure employed by Ep- 
stein and McPartland was factor extraction 
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followed by rotation to orthogonal simple 
structure. The procedure used in our study 
was hierarchical factor analysis (Schmid & 
Leiman, 1957). Hierarchical analysis differs 
from common orthogonal rotation in that it 
provides a quantitative estimate of how 
much general student attitude toward the 
academic environment is reflected in student 
ratings of specific components of that envi- 
ronment. This is similar to evaluating the 
strength of halo effects, as suggested by 
Loevinger (1967, p. 106). 

Because specific factors other than general 
attitude must be measured to produce dif- 
ferences in ratings that “discriminate” de- 
partmental strengths from weaknesses, there 
is a direct relationship between structural 
and discriminant validity. Discriminant 
validation was the purpose of Analysis 2, 
which addressed two questions: First, to 
what extent do within- and between-de- 
partment differences in mean ratings reflect 
true score variance? Second, are differences 
among means large enough to be of practical 
value? Analysis 2 sought answers to these 
questions by (a) observing score differences 
after correcting for the average reliability of 
department means and (b) by comparing 
departmental mean ratings to distributions 
of individual student scores. 


Method 


Subjects and Procedure 


In the fall of 1975, 1,336 graduate students and 3,148 
undergraduates voluntarily and anonymously re- 
sponded to the Program Evaluation Survey question- 
naire as part of their early registration procedures at the 
University of Illinois. A total of 22 academic depart- 
ments participated in the survey by request of the uni- 
versity’s Council on Program Evaluation (COPE), a 
faculty-student committee charged with the responsi- 
bility of conducting cyclical evaluation of all academic 
departments on the Urbana-Champaign campus. The 
student samples represented about 70% of the total 
enrollment for these 22 departments. Descriptive in- 
formation is provided in Table 1, which lists graduate 
and undergraduate student responses to selected de- 
mographic items appearing on the survey instrument. 
Statistics indicate proportional membership in each 
category after the deletion of 228 graduate and 396 
undergraduate subjects due to missing data on one or 
more questionnaire items entered into subsequent 
analyses. 
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Proportional Responses to Program Evaluation Survey (PES) Demographic Items 


No. semesters 


Status B Sex B; 


in major P 


Cumulative GPA P 


Undergraduates (n = 2,752) 


Freshman .17 Male Al 0-1 31 3.6 or aad y 
Sophomore .24 Female 48 2-3 .33 3.61-4.5 1 
Junior 30 4-5 22 4.6 or above 42 
Senior .29 6 plus 14 T 
Omits 00 Al :00 c 
Graduates (n = 1,108) 
Masters 63 Male 57 0-1 32 3.6 or below OL 
Doctoral .97 Female .34 2-3 24 3.61-4.59 E! 
4-5 Bu 4.6 or above .46 
6 plus 27 
Omits .00 .09 .00 .09 


Note. GPA = grade point average. 


The Questionnaire 


The original PES instrument was developed in 1973 
by a faculty and student committee. In 1974 the 
questionnaire was revised by the Office of Instructional 
Resources. These revisions were necessary in order to 
place the instrument on an optically scannable form and 
were not intended to change the nature of measures 
suggested by the original committee. 

In addition to six demographic questions, the revised 
questionnaire comprised 24 items. With items 1-9 
students reported how far their departments deviated 
from the ideal in such areas as program difficulty, 
structure, admissions standards, and so on. Responses 
to these items produced highly leptokurtic distributions 
and thus very small variances. Because intercorrela- 
tions appeared to be attenuated, items 1-9 were not 
analyzed further in the present investigations. 

The remaining 15 items produced bell-shaped dis- 
tributions and were entered into subsequent analyses. 
Students responded to these items by marking a 5-point 
bipolar scale to indicate level of satisfaction with various 
aspects of their academic departments: classroom in- 
struction, texts and instructional materials, examina- 


tions, academic advice, vocational guidance, and so 
on. 


Results 
Analysis 1 
For items 10-24, matrices of interitem 
correlations, with maximum row correlations 
replacing the diagonal elements, were fac- 


tored separately for graduate and under- 
graduate samples. Initial factor solutions 


were obtained by Simple Common Factor 
Analysis, a program which is part of the 
SOUPAC statistical package (Note 4) 
Factor extraction was followed by varimax 
and oblique rotations. Based on simple: 
structure approximation, scree tests; and 
a procedure suggested by Humphreys and 
Montanelli (1976) by which eigenvalues are 
compared to those that might occur by 
chance, a three-factor oblique solution was 
selected for undergraduates and a four-fac- 
tor oblique solution for graduates. 

Intercorrelations among oblique reference 
axes ranged from .34 to .64. To account for 
these intercorrelations, hierarchical solutions 
were obtained by the Schmid-Leiman 
transformation (Schmid & Leiman, 1957). 
This procedure produces a hierarchical or- 
thogonal solution resulting from extraction 
of higher order factors from matrices of in- 
tercorrelations among oblique lower order 
factors. 

Hierarchical factor solutions for under- 
graduate and graduate samples are pre- 
sented in Table 2. Three first-order factors 
were common to both samples: (a) personal 
value in studies, (b) satisfaction with in- 
structional procedures, and (c) satisfaction 
with faculty mentorship. Correcting solu- 
tions for differences in population stand: 
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Table 2 
Hierarchical Factor Solutions 
Overall Value Instruction Mentorship Accessibility 
Questionnaire Satisfaction (Factor 1) (Factor 2) (Factor 3) (Factor 4) 
item UG G UG G UG G UG G G 
10 44 .56 .30 PAL 36 
11 .54 54 278 39 
12 60 65 A3 AL 
13 51 .55 50 AT 
14 AQ 56 37 .33 
15 ES .53 -38 138 EL 
16 63 69 - 52 40 
17 56 64 55 .50 
18 57 63 A8 37 
19 31: +288 34 .00* 
) 20 
l 21 
22 63 67 57 ES! 
23 -70 NI AQ 35 
24 40 39 52 51 


Note. UG = undergraduate; G = graduate student. 


' Except for comparative purposes, only loadings over .3 are reported. 


leviations, as suggested by Muliak (1972, 
). 352), did not alter this interpretation. 

A fourth first-order factor emerged for 
p and was defined primarily by a 
ingle item dealing with faculty accessibility 
item 15). This factor was not retained for 
‘uture use because it was not replicated in 
he undergraduate sample. A single sec- 
»nd-order factor emerged for both samples 
ind was labeled “overall satisfaction with the 
academic department.” This factor carried 
substantial loadings from all PES items ex- 


cept those dealing with class size (items 20 
and 21). 

To permit an investigation of the instru- 
ment’s discriminating power, questionnaire 
items with high, stable factor loadings were 
selected to form subscales representing the 
three prominent specific factors. Selected 
items, subscale intercorrelations, and coef- 
ficient alphas are shown in Table 3. These 
coefficients were obtained from unweighted 
composite scores utilizing the student as unit 
of analysis. Because coefficient alphas were 


Table 3 
teliabilities and Intercorrelations of Program Evaluation Survey (PES) Subscales 
) Undergraduate Graduate 
Horst Subscale rs Horst Subscale rs 

Subscale a reliability With F2 — With F3 a reliability — With F2 — With F3 
pen value (F1) .80 83 50 Al .80 68 55 52 
atisfaction with 
_ instruction (F2) 42 .86 49 15 -66 59 
satisfaction with 
_mentorship (F3) .80 .90 .82 .19 


| j d 

Vote. F1,F2, and F3 refer to Factors 1,2, and 3. Three PES questionnaire items with high, stable factor loadings were chosen 
or each subscale or factor; they are as follows. For F1: worth of program, satisfaction with department, and student dedication; 
or F2: instruction, tests and materials, and evaluation; for F3: academic guidance, vocational guidance, and faculty-student 


ommunication. 
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found to be substantially higher than sub- 
scale intercorrelations, we concluded that 
the three specific factors accounted for a 
meaningful proportion of variance over and 
above that attributed to the general factor 
(overall satisfaction with the academic de- 
partment). 


Analysis 2 


Using the subscale composites described 
above, subscale means were computed for 
each undergraduate and graduate depart- 
ment. Reliabilities for these means, esti- 
mated according to the Horst (1949) for- 
mula, are also shown in Table 3. On the 
assumption of a normal curve approxima- 
tion, Horst coefficients were used to place 
95% confidence intervals around mean sub- 
scale scores for each department. Confi- 
dence intervals were standardized for each 
factor, using means and standard deviations 
of subscale distributions for individual stu- 
dent scores. To depict departmental com- 
parisons graphically, confidence intervals 
were plotted against these sample distribu- 
tions. Standardized subscale confidence 
intervals for each department were also 
plotted against a single normalized reference 
distribution to depict relative score differ- 
ences for individual programs. 

Figure 1 illustrates the positions of all 
undergraduate programs relative to the 


mm 


—— 
49D X. *1SD 4S0 X. +1SD 4SD X. *1SD 
Figure 1. 9596 confidence intervals for undergraduate 


department means on three Program Evaluation Survey 
subscales. (Confidence intervals are ranked and 
standardized on normalized distributions of individual 
student composite scores.) 
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means and standard deviations of under- 
graduate student distributions for each of 
the three PES subscales. (The actual stu- 
dent distributions were slightly negatively: 
skewed.) For ease of interpretation, de- 
partments were ranked according to their 
positions on each distribution, but this does 
not imply that rank orderings were identical 
for each subscale. 

Figure 1 permits two types of compari- 
sons. If means of student reference distri- 
butions are accepted as appropriate common 
standards, then users can note profiles that 
fail to overlap reference means. As a second 
procedure, the profiles for two departments, 
might be compared to determine whether or 
not they overlap. Much caution should be 
exercised in making such comparisons. 
Though the confidence level is set at .05 for 
each comparison, the confidence level for 
multiple comparisons is necessarily much 
higher. Despite this caveat, Figure 1 indi- 
cates that PES ratings do discriminate 
among undergraduate departments for| 
purposes of summative evaluation. These 
discriminations do not appear to be reflect- 
ing differences in student populations. That 
is, in no case did PES subscale correlations 
with student demographics exceed .11. 
Findings were similar for graduate data, 
though lower Horst reliabilities for graduate 
means resulted in longer confidence intervals: 
and fewer significant between-department 
discriminations. 

Did standardized confidence intervals also: 
depict relative strengths and weaknesses 
within individual departments? Nonover- 
lapping subscale profiles occurred within 7 
(out of 20) undergraduate and 2 (out of 22) 
graduate programs, indicating significant 
differences among means related to de- 
partmental strengths and weaknesses. 
Again, longer confidence intervals permitte 
fewer discriminations for graduates. 


Discussion 


partments. The most dominant factor 
measured by the PES is “overall satisfaction’ 
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with the academic department.” This 
general factor subsumes at least three more 
specific factors of satisfaction: (a) personal 
value in studies, (b) satisfaction with in- 
structional procedures, and (c) satisfaction 
with faculty mentorship. If these factors are 
viewed as representing samples from over- 
lapping hypothetical populations of evalu- 
ation questionnaire items, then they provide 
a reference structure for guiding score in- 
terpretation and future instrument devel- 
opment. For example, population 1, sug- 
gested by factor 1, might include other items 
dealing with students’ value perceptions: Is 
the academic program professionally useful? 
Intrinsically enjoyable? Socially relevant? 
Similarly, factors 2 and 3 represent popula- 
tions of items by which students evaluate 
instruction and faculty-student relation- 
ships, respectively. 

This conceptualization may be extended 


a to explain why a so-called halo effect is often 


observed for student rating data. For ex- 
ample, if the previously mentioned hypo- 
thetical item populations share substantial 
membership with a broader, more pervasive 
population of items measuring general sat- 
isfaction, then this communality explains 
rating halo. Though rating halo is often 
regarded as invalid variance, it is viewed here 
as a higher order factor representing legiti- 
mate student differences in general satis- 
faction. The presence and strength of a 
general factor is of important concern when 
assessing the usefulness of questionnaire 
data for diagnosing program strengths and 
weaknesses. Though a strong general factor 
does not necessarily represent invalid vari- 
ance, it will contaminate individual subscales 
with a general measure and will sacrifice the 
diagnostic validity of subscales if it totally 
subsumes all unique variance. 

Determining the presence and strength of 
a general factor was one purpose of Analysis 
1, which used the individual student as the 
unit of study. Results indicated that despite 
the presence of a strong general factor, it was 
possible to measure several unique dimen- 
sions of student satisfaction. However, 
when the department served as the unit of 
study in Analysis 2, only a few significant 
within-department differences in subscale 
means were actually observed. Since some 
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of these differences could have occurred by 
chance, we drew no final conclusions about 
the diagnostic usefulness of PES sub- 
scales. 

Additional studies with selected depart- 
ments should provide a better indication of 
the instrument’s full diagnostic potential. 
These studies should seek to confirm that 
within-department score differences are tied 
to observable departmental characteristics. 
Future research might also seek to improve 
the precision of mean factor measures. In- 
creasing the reliability of department sub- 
scale means would make our comparison 
procedures more sensitive to small score 
differences and would probably increase the 
number of within-department discrimina- 
tions. 

Although our data only tentatively sup- 
port the validity of PES as a measure of 
program strengths and weaknesses, our 
analyses indicate that it is possible to dis- 
criminate among academic departments in 
asummative sense. The reliability of mean 
subscale scores was quite high for under- 
graduate programs, permitting frequent 
between-department discriminations at a 
high level of confidence. Fewer discrimi- 
nations were possible for graduate programs 
due to lower reliability estimates for subscale 
means. It should be noted that reliability 
estimates such as the Horst coefficient are 
greatly affected by n, the number of students 
surveyed in each department. Since grad- 
uate programs were generally smaller than 
undergraduate programs, mean reliabilities 
were substantially lower in the graduate 
sample. This would seem to indicate that 
the procedures employed in Analysis 2 are 
not satisfactory for comparing departments 
with small enrollments. However, these 
procedures proved quite useful for compar- 
ing departments with fairly large enroll- 
ments. Whether or not such between-de- 
partment discriminations can be tied to ob- 
jective observable characteristics of pro- 
grams is a matter of importance and should 
be a focus of future research. 
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Acquiring Teacher Behavior Concepts Through the Use of 
High-Structure and Low-Structure Protocol Films 


David Gliessman and Richard C. Pugh 


Indiana University at Bloomington 


This study investigated the effect of protocol films of contrasting structure 
(i.e., presence or absence of formal interpretive framework) on the acquisition 
of teacher behavior concepts and reactions to the filmed treatment. A total 
of 70 preservice and in-service teachers at the graduate level viewed high or 
low structure films instancing a set of six teacher behavior concepts. Pre- and 
posttesting was done to assess gains in concept acquisition measured by the 
identification of concept instances in filmed vignettes. The results show (a) 
significant gains in concept acquisition for groups viewing films of high or low 
structure, (b) no differences in concept acquisition posttest scores among the 
same trainee groups, and (c) significantly more favorable reactions to films 


showing a high degree of structure. 


A basic goal of teacher education is that 
teachers should acquire and be able to use 
relevant concepts in the interpretation of 
classroom behavior. To increase the likeli- 
hood of achieving this goal, Smith (1969) 
proposed the development of “protocol 
materials" for use in the training of teachers. 
In such materials, behavioral events are re- 
produced or portrayed (typically on video- 
tape or film) that contain instances of con- 
cepts from the behavioral sciences and other 
disciplines. Smith's reasoning was that 
learning to identify instances of specified 
concepts in such “behavioral records" was 
instrumental to the acquisition of those 
concepts as interpretive categories. Stim- 
ulated by this proposal, a large number of 
protocol materials were developed, and some 
evidence on their effectiveness has accu- 
mulated. The results of the studies that 
have been reported are consistent: when the 
criterion task is accuracy in identifying in- 
stances of specified concepts in vignettes of 
classroom behavior, training based upon 
protocol materials has shown significant ef- 
fects on concept acquisition (Borg, 1973; 
Gliessman & Pugh, 1976; Pugh & Gliessman, 
1976; Berliner, Note 1; Kleucker, Note 2). 

There is evidence also that when the con- 
cepts to be acquired refer to teacher behav- 


Requests for reprints should be sent to David 
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ior, learning to identify instances of those 
concepts can contribute directly to the ac- 
quisition of the referent behaviors as teach- 
ing skills. Kleucker (Note 2) and Wagner 
(1973) both report that learning to identify 
instances of a teaching skill on videotape or 
film had a significant effect on the frequency 
of using that skill in a microteaching setting 
without prior overt practice. In the latter 
study, the same result was not achieved 
through practice in the skill alone. San- 
tiesteban and Koran (1977) report that the 
use of audiovisual and audio-based exem- 
plification of teacher behavior concepts, 
again without overt practice, resulted in the 
acquisition of the specified teaching behav- 
iors or skills including acquisition of the 
concepts themselves. Finally, Koran, Snow, 
and McDonald (1971) found that supple- 
menting practice with examples of a teaching 
skill was necessary to produce significant 
change in the use of that skill. 

The available evidence indicates, then, Ț 
that protocol films can be instrumental to 
the acquisition of concepts as interpretive 
categories and also to the development of 
teaching skills.! Direct evidence is lacking, 


1 Since protocol materials in the audiovisual medium 
are produced and distributed largely on film, the term 
protocol films will hereafter be used. It is predictable 
that the data reported in the present studies would have 
been essentially the same if they had been generated 
through the use of videotape (Pugh, Goodwin, & 
Gliessman, 1976). 
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however, on the most effective design for 
protocol films in terms of their primary 
purpose, that of concept acquisition. This 
lack of evidence is reflected in the history of 
their development. Some of the earlier films 
were in fact protocols or “records” of class- 
room behavior in which instances of speci- 
fied concepts could be identified. This de- 
sign format, which conforms closely to 
Smith’s original specifications for protocol 
materials, requires the trainee to analyze 
ongoing behavior for instances of a specified 
concept. The potential effectiveness of this 
format might be argued on the basis of the 
similarity of the interpretive task and stim- 
ulus to that of the criterion test itself: 
identifying instances of concepts in vignettes 
of classroom behavior (and ultimately in 
observed behavior in the actual classroom). 
Such similarity between the training condi- 
tion and the testing condition is usually as- 
sumed to facilitate transfer (e.g., Ellis, 1965, 
pp. 70-74). 

With actual experience in the production 
and use of protocol films, a few developers 
began to incorporate specific concept 
teaching techniques in the design of their 
films and supporting materials. These 
techniques included portraying a range of 
carefully selected examples, identifying the 
concept category for these examples, and 
highlighting the critical characteristics or 
dimensions of the concepts themselves. 
These and similar design elements are gen- 
erally consistent with suggestions for effec- 
tive concept teaching that have been derived 
from the research on concept learning 
(Clark, 1971; Markle & Tiemann, 1971; 
McDonald, 1965; Hudgins, Note 3). The 
potential effectiveness of these techniques 
might be inferred from the evidence pro- 
vided in that research. 

In the absence of direct evidence on the 
comparative effectiveness of these con- 
trasting formats, the design of protocol films 
can be guided only by inferences from a more 
general research literature. Such guidelines 
may be inappropriate, however, partly be- 
cause the concepts to be acquired through 
protocol films frequently are nominally fa- 
miliar, Desi aac Weh inference concepts 

e those on which many concept learni: 
studies have been based. The ces of ur 
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present series of studies is to provide t] 
direct evidence necessary for optimal dec 
sions about design. 

The design variable underlying the dil 
fering formats described above, and of spe 
cific interest in this series of studies, migh 
be called degree of structure. The term 
structure is used here to refer to the frame 
work of a protocol film with reference to th 
instances or examples it portrays. In th 
protocol film series “Concepts and Patte 
in Teacher-Pupil Interaction," the sam 
teacher behavior concepts are treated in tw 
subsets of films displaying contrasting dt 
grees of structure? Thus, the films in hi 
series provide a direct means of evaluatin 
the effect of that variable. In the subs 
displaying a high degree of structure, a co 
ceptual framework is provided within whi 
instances are categorized under concep! 
and concepts are differentiated from ol 
another; instances are selected and po 
trayed that are particularly clear example 
of concepts; and critical characteristics ol 
concept definitions are emphasized. Thii 
structuring is accomplished through the s 
of such filmic devices as narration, internal 
titles and graphics, and instant replay 
These high-structure films can be described 
essentially as concept teaching materials. 

In the subset of low-structure films, it 
contrast, no conceptual framework is pro 
vided and no filmic devices are used (excep! 
for selection of excerpts from total film 
footage). Instances of the specified concepti 
appear as natural events occurring in con: 
tinuous classroom activity. The only ob: 
vious structuring is provided by the flow of 
classroom events. The films in this subset 
are essentially behavioral records to be an 
alyzed in terms of specified concepts. 


Method 


The distribution of the films 


y the Indi doe be 
Center. iana University Audio-Vis 
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The experimental questions posed were the fol- 
lowing: 

1. Is there a significant difference in concept ac- 
quisition posttest means between instructional groups 
using high-structure and low-structure films? (Studies 
1 and 3 deal with this question.) 

2. Is there a significant difference in concept ac- 
quisition posttest means among instructional groups 
using high-structure films, low-structure films, and 
high- and low-structure films in combination? (Study 
2 attempts to answer this question.) 

3. Is there a significant difference between or among 
the same groups in trainee perceptions of the instruc- 
tional value, conceptual clarity and difficulty; and de- 
sign effectiveness of the type of films used? (Studies 
1, 2, and 3 investigate this question.) 

In Study 1, a direct comparison was made of the use 
of high-structure and low-structure films. Study 2 was 
undertaken to assess the possible interactive effects of 
using both types of films in a single training group. In 
this study, the comparative effects of using high- 
structure, low-structure, and both high- and low- 
structure films in combination were investigated. 

Study 3 was conducted to check on the possible effect 
of a variation in the instructional treatment that 
emerged during the first two studies. In an effort to 
emphasize the instructional (rather than the experi- 
mental) purpose of the investigation to the trainees, 
allowance was made for any discussion that occurred 
during film viewing. Since this condition applied to all 
instructional groups, it was not anticipated that it would 
be of differential influence. However, informal ob- 
servation during the first two studies indicated a higher 
incidence of discussion among trainees in the low- 
structure groups. The authors interpreted this ob- 
served behavior as an indication that the low-structure 
films posed a more problematic task in identifying 
concept instances. Because of this variation, it would 
be difficult to know the extent to which any posttest 
differences favoring the low-structure groups might be 
due to the interactive effect of discussion rather than 
to the direct effect of the design characteristics of the 
films. If such a difference was attributable to the 
variable of discussion, a problem would arise concerning 
degrees of freedom and the appropriate unit of analysis 
to test the significance of the difference between means. 
To help clarify the possible influence of discussion as 
a variable, then, a direct comparison of high-structure 
and low-structure films was repeated in Study 3 with 
a major change in procedure: To strengthen the argu- 
ment for independence of scores within treatment 
groups, no discussion of concepts or film content was 
permitted within groups during the period of film 
viewing. Although this limitation was antithetical to 
the instructional context of these studies, it was clearly 
necessary to resolve an important interpretive issue. 


Concepts and Instructional Materials 


The films in the “Concepts and Patterns” series are 
based on six concepts (in three pairs) describing teacher 
behavior in an interactive setting. These concepts were 
selected and adapted from interpretive categories 
common to the empirical and theoretical literature on 
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teacher behavior. The concepts are labeled and defined 
(in somewhat abbreviated form) as follows: 

Reproductive question. This is a teacher question 
intended to elicit directly the recall of content specifi- 
cally learned as part of a course or topic of study. In 
response to such a question, the student is expected to 
reproduce accurately such content or to recognize when 
it is accurately reproduced by someone else. 

Productive question. This teacher question is in- 
tended to encourage the production of ideas or combi- 
nations of ideas as opposed to the simple reproduction 
of specifically learned content. A student response to 
such a question may reflect the recall of specifically 
learned content, but that content is used in such pro- 
cesses as interpretation, application, and evaluation. 

Probing. This is a teacher reaction in the form of a 
question or an implied question that pursues some as- 
pect of the substantive content of a preceding student 
response. Such probes typically seek further descrip- 
tion, clarification, explanation, or extension of that 
substantive content. 

Informing. This is a teacher reaction in which in- 
formation is introduced that is related to some aspect 
of the substantive content of a preceding student re- 
sponse, Such a reaction is often intended to produce 
some modification in the substantive content of that 
student response. 

Approving, This is a verbal or nonverbal teacher 
reaction that is intended to encourage, or might rea- 
sonably be expected to encourage, continued student 
responding or a continuation of student behavior. 

Disapproving. This is a verbal or nonverbal teacher 
reaction that is intended to discourage, or might rea- 
sonably be expected to discourage, continued student 
responding or a continuation of student behavior. 

As indicated in the introduction, the concepts are 
instanced or exemplified in both high- and low-struc- 
ture films. Furthermore, since the examples portrayed 
in the high-structure films are drawn from the more 
numerous instances recorded in the low-structure films, 
the specific classroom episodes shown are frequently the 
same in both films. To illustrate, the following excerpt 
from a low-structure film shows a secondary-school 
social studies class discussing the problem of population 
control. This excerpt contains at least one instance of 
several of the concepts defined above: 


"TEACHER: Ok, let's get someone else's, ah, opin- 
ion on this. Susan? 


SusAN: I don’t think that the government should 
regulate it. I think that they should, you know, 
awake the people, tell them what’s going on. And, 
you know, it would be better if you didn’t have two 
children and if you didn’t have any more than two 
children; but if they did, you know, they shouldn’t, 
you know like, do away with the third child. 


TEACHER: Do you think that this is going to get 
the job done, that people are going to listen to, ah, just 
suggestions? 

SUSAN: I think they'll listen, but I don't. .. . It will 
help alittle. I’m not saying that this will wipe out the 
whole problem, but it can be one of the things, you 
know. You can have like several plans, you know, to 
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tell the people and maybe all these combined will help 
control it. 


TEACHER: Several... like what kind of plans are 
you thinking about? 


SUSAN: Well, like, well, like maybe the govern- 
ment could distribute, uh, in the ghetto areas, some 
kind of contracep . . . or some kind of, uh, birth con- 
trol methods, and this way they can, uh, help. You 
know, help the population. That’s one way, you 
know, with birth control methods ... you know 
and 


TEACHER: Ok, let’s, before you mention any 
others, let's stop right there for a moment. Once, 
once you get into the field of birth control methods, 
now you're getting into an issue that has great moral 
connotations and overtones. 


This dialogue contains two particularly clear instances 
of probing. One of these is emphasized as an example 
of that concept in the following excerpt from a high- 
structure film that utilizes much of the same dia- 
logue: 


NARRATOR: This secondary school social studies 
class is discussing the problem of population growth 
and population control. Listen. 


SUSAN: ... but it can be one of the things, you 
know, you can have like several plans, you know, to 
tell the people and maybe all these combined will help 
control it. 


TEACHER: Several... like what kind of plans are 
you thinking about? 


SUSAN: Well, like, well, like maybe the govern- 
ment could distribute, uh, in the ghetto areas, some 
kind of contracep . . . or some kind of, uh, birth con- 
trol methods, and this way they can, uh, help. You 
know, help the population. "That's one way, you 


mum with birth control methods ... you know 
an 


TEACHER: Ok, let's before you mention any oth- 
ers, let's stop right there for a moment. Once, once 
you get into the field of birth control methods, now 
you're getting into an issue that has great moral 
connotations and overtones. 


NARRATOR: During this interaction the instruc- 
tor influenced the direction and the content of the 
discussion first by probing. 


TEACHER: ...like what kinds of plans are you 
thinking about? 

[Title “Probing” appears on screen over scenes of 
Susan listening to teacher.] 


NARRATOR: The concept of probing is a way of 
describing a teacher's reaction to a student response. 
Listen again to this reaction, beginning with the 
preceding student response. 


[Scene "frozen" on Susan leading up to the fol- 
lowing response.] 


SUSAN: You can have like several plans, you 


know, to tell the people and and maybe all these 
combined will help control it. 
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you thinking about? 


Evident in the above excerpt is the use of several filmic 
devices to provide structure: narration, internal titles, 
repeat of scenes, and "frozen frames." 

'The different formats illustrated above are repre- 
sentative of the contrasting treatments given all six 
concepts in the high- and low-structure films. With 
either type of film, however, the trainees had concept. 
definitions available in printed form during the in- 
structional treatment. 


Evaluation Instruments 


The evaluation instrument used to assess concept 
acquisition was the film test “Categorizing Teacher 
Behavior” (Pugh et al., 1976). This film-based test was 
designed to assess accuracy in categorizing instances of 
teacher behavior in terms of the six concepts specified 
in the film series. The test consists of 30 brief classroom 
episodes presented on film. Since the test episodes 
were selected from film footage not included in the 
protocol films themselves, the specific episodes ap- 
pearing in the test film do not appear in the other films 
of the series. However, classroom settings, teachers, 
and pupils are common to the test film and the protocol 
films of both high and low structure. 

The task for the respondent is to determine if the 
filmed episodes do or do not contain instances of the 
specified concepts. The test is divided into three parts, 
each part assessing accuracy in using a different com- 
bination of two concept pairs. Thus, the three parts of 
the test together provide for responding in terms of all 
possible combinations of the three pairs of concepts. 
Separate test items (yes or no options) are provided for 
each of the four concepts presented for an episode. On 
a separate answer sheet, the respondent indicates 
whether or not a specified concept is illustrated in the 
episode. A total of 120 items (40 items for each concept 
pair) are contained in the test. Each episode is pre- 
sented twice with a delay of a few seconds between 


presentations; the purpose of this repetition is to min- | 


imize the possible influence on test scores of differential 
memory for specific episodes. Following the second 
presentation of each episode, the respondent is given 
a 15-sec period in which to record a choice. The total 
time for the test is approximately 35 minutes. 

The strategy devised for assessing test reliability and 


validity involved evaluating the degree to which the test | 


items were consistently sensitive to change induced by 
planned instruction (Brown & Pugh, Note 4). Evidence 
gathered in empirical studies based on this strategy 
clearly indicated that the Categorizing Teacher Be- 
ng test demonstrates such consistency (Pugh et al. 

The reactions of trainees to film-based instruction 
was assessed through selected Likert-type items from 
an evaluation scale addressed to a variety of qualitative 
characteristics of the films. In a previous validation 
study of this scale, items were identified that had sig- 
nificant (p < .05) item-total correlation coefficients for 
at least two validation samples (Gliessman & Pugh 
1976). From this pool of reliable items, 13 were selected 
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«" to assess trainee perceptions of the instructional value, 
conceptual clarity and difficulty, and design effective- 
ness of the protocol films used in this study. 


Trainees 


The trainees in each study were preservice and in- 
service teachers enrolled in a graduate-level educational 
psychology course. The trainees were heterogeneous 
in terms of teaching specialization and included both 
males and females. 


Procedure 


Since the intent of these studies was that any gener- 
alization be applicable to the design of protocol films as 
instructional materials (rather than research materials), 
an effort was made to fit the experimental procedure as 
naturally as possible into the content and context of the 
instructional setting. To accomplish this, viewing of 
films was done in groups rather than individually; as 
indicated in the previous section, time was allowed in 
the first two studies for any discussion that occurred, 
although discussion was not actively encouraged; the 
trainees were advised that results on the film posttest 

pawould be used as part of the evaluation of their course 
work, 

In each study, trainees were randomly assigned 
within matched pairs (or trios in Study 2) to the in- 
structional groups. The criteria for matching were sex 
and pretest performance on the film test.? The sole 
purpose of matching was to eliminate the effect of the 
latter two factors as potential nuisance variables. 

In the case of all studies, trainees were pretested on 
the film test 1 to 2 weeks prior to instructional treat- 
ment. An overview of interactive teaching in the 
classroom was presented to the trainees in each study 
as a total group. During the instructional treatment 

« itself, trainees in each group were provided with a 
printed summary of the concept definitions and a brief 
set of open-ended questions or directions designed to 
focus attention on the task of observing examples (when 
viewing high-structure films) or finding instances (when 
viewing low-structure films). Emphasis was placed on 


Table 1 
Summary of Instructional Treatments 
n Film treatment Total film length 
, Study 1 
10 High structure 44 min 10 sec 
10__Lowstructure 53 min 35 sec 
Study 2 
10 High structure 44 min 10 sec 
10 Lowstructure 44 min 30 sec 


10__High/low structure 


Study 3 
10  Highstructure 


10 Low structure 


44 min 55 sec 


44 min 10 sec 
44 min 30 sec 
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the ultimate goal of learning to recognize instances of 
the specified concepts. 

Major procedural details for each study are summa- 
rized in Table 1. The reduction in the range of viewing 
times across treatment groups form Study 1 to Studies 
2 and 3 resulted from an attempt to equate more care- 
fully specific viewing times. In the combinative film 
treatment in Study 2, high-structure and low-structure 
film footage were alternated beginning with a high- 
structure film, In this sequence, low-structure film 
footage was selected that specifically contained in- 
stances referred to by the concepts presented in each 
high-structure film. Finally, as indicated previously, 
discussion of concepts and film content was specifically 
discouraged during Study 3. 

Posttesting on both the film test and film evaluation 
scale were done on the day following training in Study 
1. In Studies 2 and 3, posttesting immediately followed 
training. In all cases, trainees were posttested as a 
single group. 


Design and Analysis 


Experimental questions as to whether there were 
significant differences in concept acquisition among 
groups viewing different types of film were investigated 
by comparing posttest means on the film test. The tests 
of differences among means were conducted by utilizing 
a randomized block design with one replicate per cell. 
In Study 2, pair-wise comparisons of posttest means 
were tested using Tukey’s honestly significant differ- 
ence (HSD) test. 


Results and Discussion 


Study 1 


There was no significant difference, F (1, 
9) = 3.00, p > .10, in concept acquisition 
posttest means between the group viewing 
high-structure films (M = 101.9) and the 
group viewing low-structure films (M = 
104.9). There were, however, significant 
increases in concept acquisition by both 
groups as a result of the instructional treat- 
ments. The group that viewed the high- 
structure films changed from a pretest mean 
of 95.1 to a posttest mean of 101.9, a differ- 
ence that was highly significant, F(1, 9) = 
10.31, p < .01. The group that viewed the 
low-structure films changed from a pretest 
mean of 95.4 to a posttest mean of 104.9, a 
difference that was also highly significant, 
F(1,9) = 21.81, p < .01. 


* In the case of Study 3, absences during the in- 
structional treatment made it necessary to base the final 
analysis on trainee pairs matched on the criterion of 
pretest performance alone. 
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Alternative interpretations of two aspects 

of these results were considered and rejected: 
First, the findings of a significant increase in 
concept acquisition might be accounted for 
by a practice effect on the test itself, since 
the same form was used for pretest and 
posttest. However, in the validation studies 
of the Categorizing Teacher Behavior test, 
results indicated that although a practice 
effect was discernible, pre-post gains were 
largely attributable to instructional treat- 
ment (Pugh & Gliessman, 1976). Thus, 
previous evidence would indicate that the 
pre-post gains in Study 1 were accounted for 
by practice to only a minor extent. Second, 
the simultaneous findings of a significant 
increase in concept acquisition by both 
groups and a nonsignificant difference be- 
tween group posttest means might occur if 
the film test was not sensitive to change be- 
yond the level achieved by the group with 
the higher posttest mean. However, this 
possibility was not considered tenable be- 
cause, in the same Pugh and Gliessman 
study, instructional treatment resulted in 
posttest means as high as 107. Neither 
treatment in Study 1 resulted in posttest 
means of the same magnitude. Thus, the 
film test was considered sensitive to differ- 
ences that might have occurred between the 
instructional treatments. 
É With respect to trainee perceptions of the 
instructional characteristics of the films, 
differences in means favored the high- 
structure films for 10 of 13 items relating to 
instructional value, conceptual clarity, and 
design effectiveness. For 3 items, the dif- 
ferences in item means were statistically 
significant, F(1, 17) = 3.41 to 4.30, p < .10.4 
Judging from reactions to these 3 items, the 
group that viewed the high-structure films 
agreed more strongly with the following 
statements: “I feel I learned a lot from these 
films”; “The most important concepts were 
clearly illustrated in the films”; and “The 
content of the films was easy to the point of 
being obvious.” 


Study 2 


There was a significant difference, F(2, 18) 
= 5.76, p < .02, in concept acquisition post- 
test means among the groups viewing high- 
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structure films (M = 99.4), low-structure » 
films (M = 104.6), and a combination of 
high-structure and low-structure films (M 
= 102.5). Using Tukey’s HSD procedure, 
the comparison of means between the group 
viewing high-structure films and the group 
viewing low-structure films was highly sig- 
nificant (p < .01), but pair-wise comparisons 
of means of the group viewing a combination 
of high- and low-structure films with either | 
of the other groups were not significant. 

As in Study 1, a significant increase in , 
concept acquisition was demonstrated for all | 
treatment groups. The group that viewed 
high-structure films changed from a pretest | 
mean of 89.7 to a posttest mean of 99.4, a 
difference that was highly significant, F(1, | 
9) = 19.69, p <.01. The group that viewed 
low-structure films changed from a pretest 
mean of 90.2 to a posttest mean of 104.6, a 
difference that was highly significant, F(1, 
9) = 30.38, p < .01. And the group that" 
viewed a combination of high- and low-* 
structure films changed from a pretest mean 
of 89.5 to a posttest mean of 102.5, a differ- 
ence that was also highly significant, F(1, 9) 
= 82.50, p < .01. 

Trainee reactions to the instructional 
characteristics of the films were more fa- 
vorable in the high-structure group for 12 of 
13 items. For 7 items, the differences in 
item means were statistically significant (p | 
X .10). Pair-wise comparisons of means, 
using Tukey's HSD test resulted in signifi- 
cant differences (p < .10) for each of the 1 
items between the group that viewed the 
high-structure films and the group that 
viewed the low-structure films. On only 2 
items did the pair-wise comparisons of 
means involving the group that viewed the 
combination of high-structure and low- 
Structure films result in a significant differ- 
ence (p < .10) from either of the other film. 
groups. | 

Based on reactions to the above 7 items; 
the group that viewed the high-structure 
films agreed more strongly with the followin£ | 
statements: “The important concepts wee | 
clearly illustrated in the films”; “The content | 
of the films was easy to the point of being | 
obvious"; *I could easily see how the films 


4 One trainee failed to respond to the items. 
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** were related to one another”; “Generally 
speaking, the individual films were arranged 
in a way that made them easy to follow.” 
The group that viewed the high-structure 
films disagreed more strongly with the fol- 
lowing statements: “The purpose of the 
films was never clear to me”; “The content 
of the films was sometimes confusing”; “We 
needed more opportunities to apply the 

. content of these films.” 

Compared to the use of high-structure or 
~«Jow-structure films alone, the combined use 
of high- and low-structure films in Study 2 
was not shown to be of differential advan- 
tage. In its replicative aspects, however, the 
major findings of Study 2 were consistent 
with the findings of Study 1: All three 
treatments resulted in significant gains in 
concept acquisition, while the trainees 
reacted significantly more favorably to the 
high-structure films. In addition, with 
specific reference to the comparative results 

"for the high-structure and low-structure 
groups, the significantly higher concept ac- 
quisition posttest mean for the low-structure 
group was consistent with a similar trend 
apparent in the results of the first study. 


Study 3 


There was no significant difference, F(1, 
9) < 1.00, p > .10, in concept acquisition 
posttest means between the group viewing 
high-structure films (M = 101.7) and the 
group viewing low-structure films (M = 
100.2). There was a significant increase in 
concept acquisition by both groups during 
the instructional treatment. The group that 
viewed the high-structure films changed 
from a pretest mean of 90.7 to a posttest 
mean of 101.7, a difference that was highly 
significant, F(1, 9) = 13.28, p « .01. The 
group that viewed the low-structure films 
" changed from a pretest mean of 91.0 to a 
| posttest mean of 100.2, a difference that was 
also significant, F(1, 9) = 6.41, p < .05. 

These results indicate clearly that the 
| apparent differences in posttest means fa- 
| voring the low-structure film group in the 
| previous two studies were not solely attrib- 
| utable to the design characteristics of the 

films themselves. Further, the differences 
may not have been highly reliable due to a. 
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question regarding the most appropriate 
unit of analysis. To the extent that these 
were reliable differences, they were attrib- 
utable to either the differential effect of 
discussion among the trainees in the low- 
structure and high-structure groups or to the 
interaction of this discussion with the rela- 
tive structure of the films. The results also 
indicate, however, that reliable gains in 
concept acquisition in both high-structure 
and low-structure groups are replicated 
when unusual conditions are imposed on the 
instructional treatment to assure the inde- 
pendence of posttest scores within groups. 

With reference to trainee reactions to the 
instructional characteristics of the films, 
there continued to be evidence that the 
high-structure films were viewed more fa- 
vorably than the low-structure films. Dif- 
ferences in means favoring the high-struc- 
ture films were found for 9 of 13 items re- 
lating to the instructional value, conceptual 
clarity, and design effectiveness of the films. 
For 5 items, the differences in item means 
were statistically significant (p < .10). The 
group that viewed the high-structure films 
agreed more strongly with the following 
statements: “The important concepts were 
clearly illustrated in the films"; "The content 
of the films was easy to the point of being 
obvious”; “I could easily see how the films 
were related to one another"; “Generally 
speaking, the individual films were arranged 
in a way that made them easy to follow." 
The group that viewed the high-structure 
films disagreed more strongly with the 
statement “The content of the films was 
sometimes confusing." These significant 
trends in item means were also found in 
Study 2. The directionality of trends in the 
item means for the other 8 items were very 
similar to those in the first two studies, ex- 
cept that there was an overall tendency to 
react to the films less positively perhaps 
because of the constraints on discussion. 
This was evident on such items as “I feel I 
learned a lot from these films,” “The films 
were worth the time we spent on them,” and 
“The films held my interest.” 


Summary and Implications 


In terms of concept acquisition, the results 
of this series of studies did not demonstrate 
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a differential advantage for protocol films 
designed essentially as concept teaching 
materials (high structure) nor for those de- 
signed essentially as materials to provide 
practice in the interpretive use of concepts 
(low-structure). Thus, the design charac- 
teristics associated with degree of structure 
had no discernible influence on acquisition 
of the specified concepts. Further, the use 
of both types of films in combination was not 
of demonstrated advantage. Instruction 
incorporating either high-structure or low- 
structure films alone or both types together 
led to reliable and approximately equal gains 
in acquisition of the behavioral concepts 
basic to the films. Thus, the interactive ef- 
fects that might be expected to follow from 
viewing clearly identified examples of con- 
cepts in a simplified setting in conjunction 
with practice in identifying instances of the 
same concepts in complex behavior were not 
demonstrated. This finding in particular is 
at variance with guidelines for the design of 
protocol films that are suggested by studies 
in concept learning (e.g., Hudgins, Note 3). 

These results indicate that there are rea- 
sonable degrees of latitude in the design of 
protocol films that can be used effectively for 
concept acquisition, Technical or aesthetic 
considerations, for example, may be allowed 
to influence degree of structure with the 
expectation that a well-designed film with 
either high or low structure can contribute 
to gains in concept acquisition. The authors 
presently are gathering data to test the hy- 
pothesis that two content characteristics of 
protocol films are instrumental to gains in 
concept acquisition: explicit definition and 
unambiguous instancing of concepts. In the 
present article, a well-designed film is de- 
fi ined partly as one possessing these charac- 
teristics. 

In contrast to the results for concept ac- 
quisition, there were reliable differences 
between two groups in perceptions of the 
instructional treatment: Reactions of the 
trainees were consistently more favorable to 
the high-structure than to the low-structure 
films. To a highly reliable degree, the 
trainees perceived the high-structure films 
as possessing greater conceptual clarity and 
less content and structural difficulty. To 
the extent that more generally favorable 
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trainee reactions are judged to be important, ... 
it would seem wise to build a greater degree 

of structure into a protocol film. It is pos- 
sible also that such perceived qualities as 
greater clarity and less difficulty contribute 
directly to consumer acceptance and use, 
Since protocol films generally are produced 
with extensive instructional use as an ob- 
jective, factors that may contribute to ac- 
ceptance are clearly important. | 

Of some interest was the observation in E 
these studies that the use of low-structure s 
films stimulated a higher incidence of task- | 
related discussion among trainees. Evi- | 
dence from responses on the evaluation scale | 
indicated that the absence of formal struct- | 
uring cues in the low-structure films made | 
the task of categorizing the filmed behaviors | 
more difficult. This task demand, in turn, | 
may have stimulated a higher incidence of | 
discussion directed at resolving uncertainty | 
about the correct categorization of the ob- * 
served behaviors. Thus, it is possible that" 
a low degree of structure, by stimulating in- | 
creased problem-centered discussion among | 
motivated trainees, may indirectly result in 
greater increments in concept acquisition. | 
This possibility remains an interesting 
question for further investigation. 

Finally, acquisition of the actual teaching 
behaviors referred to by the concepts in this 
series of studies ought to be explored. Since + 
there is direct evidence that protocol-based p 
instruction similar to that in these studies 
can result in acquisition of skills as well as 
concepts (Kleucker, Note 2), further re- 
search might profitably focus on the effect 
of specific content, design, and outcome di- 
mensions of training on skill acquisition. 
While degree of structure might logically be 
one of the dimensions investigated, the au- 
thors view level of concept acquisition (i.e 
level of concept mastery) as one of the more 
promising dimensions to explore. In acur- 
rent study, data are being gathered on the 
hypothesis that the level of concept mastery 
achieved through protocol-based training i$ 
predictive of skill acquisition. 
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Effects of Visual and Auditory Distractors on 
Learning Disabled and Normal Children's 
Recognition Memory Performance 
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"This research investigated whether distractibility in learning disabled chil- 
dren could be predicted on the basis of diagnosed visual and auditory learning 
deficits. Twenty-six children in Grades 2, 3, and 4 were classified as having 
visual or auditory reading disorders. They and 17 normally achieving chil- 
dren performed visual and auditory recognition memory tasks, with visual or 
auditory distractors presented on 8096 of the trials. Analysis of error frequen- 
cies revealed that with distractors, children in the two learning disabled 
groups made more errors and did not improve over trials as much as the nor- 
mal control children. However, the predicted interaction between learning 
disability modality and task or distractor modality did not obtain. Rather, all 
three subject groups made more errors when task and distractor were in the 


same modality. 


Learning disabled children are often de- 
scribed as being distractible. In fact, many 
clinicians and practitioners accept distrac- 
tibility as a major etiological characteristic 
of these children (e.g., Cruickshank & Paul, 
1971). In the school setting, the conse- 
quences of this thinking often are evident in 
the highly structured design of classrooms 
and programs for the learning disabled. At 
the same time, existing research has failed to 
document the pervasiveness of distractibility 
within this population or to support the ef- 
ficacy of environmental restriction as a 
means of enhancing these children's learning 
(see review by Doleys, 1976). In effect, very 
little is actually known about the parameters 
of distractibility in this group of children. 

There are two major problems associated 
with this area of investigation. First, a ma- 
jority of research with learning disabled 
children has been directed at the hypothesis 
that the class as a whole is hyperdistractible 


This article is based on a dissertation submitted by 
the first author to the Department of Psychological 
Sciences, Purdue University, in partial fulfillment of the 
PhD degreee. Funding for this research was provided 
through a grant from the David Ross Research Foun- 
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(e.g., Hunter, 1971; Vernetti & Jacobs, 1972). 
This is unfortunate because the range of 
problems included within the classification 
is extremely broad, and it is unlikely that 
specific attentional deficits common to all 
disabled learners can be identified simply 
because variability within the general pop- 
ulation is too great (Torgeson, 1975). The 
question is not whether learning disabled 
children are more distractible. Rather, à 
major focus of this research should be the 
identification of subject variables that are 
useful for predicting distractibility within 
the population. Research with normal 
children indicates that a number of personal 
factors may be related to attentional pro- 
cesses, including cognitive style (Blowers: 
1976), chronological age (Perelle, 1976) and 
auditory-visual preferences (Perelle, 1976). 
Whether any of these variables might pro- 
vide a basis for predicting distractor effects 
in learning disabled children remains to be 
determined. 

A second problem in the research concerns 
the need to develop a more comprehensive 
conceptualization of distractibility. Widely 
publicized clinical descriptions of learning 
disabled children have contributed to the 
acceptance of distractibility as a static trait. 
On the other hand, among normal children 
and adults, individual response to extrane 
ous stimulation has been shown to vary i! 
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* relation to such contextual factors as task 
difficulty (Sen & Chowdhury, 1974), stimu- 
lus novelty (Hutt, 1975), and relatedness of 

. task and distractor (Carriers & Fite, 1977). 

It is reasonable to assume that situational 

factors affect the attending behavior of 

learning disabled children as well. 

In general, then, the bulk of evidence from 
research with normal populations indicates 
that the structure of distractor effects is 
complex, involving the interaction of subject, 
». task, and distractor variables. If prediction 
| and control are to be viable goals, then we 

need to determine for which children and 
under what circumstances distraction oc- 
curs. 
The purpose of the present research is to 
investigate the usefulness (hence, the va- 
lidity) of one particular model of learning 
disability for predicting distractor effects. 
Often, the achievement problems of learning 
see disabled children are attributed to specific 
central nervous system dysfunction involv- 
ing the visual and auditory modalities 
(Chalfant & Scheffelin, 1969). Implicit in 
this modality model is the assumption that 
each modality represents a separate and 
unique information processing system. 
|. Presumably, a deficiency in either modality 

creates a learning disorder. If this is the 
| case, then the diagnosis of modality-specific 
| learning deficits should also provide a basis 
ye« for predicting the kinds of stimuli that will 
| be most distracting. That is, irrelevant 
stimulation presented to the learning dis- 
abled child's weaker auditory or visual 
modality should interfere with performance 
because the child is unable to effectively 

"shut out" that information (Kirk & Kirk, 

1971). 

To carry out the study, learning disabled 
children were classified as having either 
visually-based or auditorily-based reading 
deficits. Each child then performed visual 
and auditory recognition memory tasks 
under conditions of visual and auditory 
distraction. Support for the model was ex- 
pected to be obtained if the performance of 
children within each classification was 
poorer when tasks and distractors involved 

| the deficient modality. Specifically, audi- 
| torily disabled learners were expected to 
,, Perform most poorly when task and dis- 


~ 
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tractor were auditory. The visually disabled 
learners were expected to show the same 
trend, that is, to perform most poorly when 
task and distractor were visual. 


Method 
Subjects 


Children in Grades 2, 3, and 4 were the subjects for 
this study. All were from a large, consolidated ele- 
mentary school in west central Indiana. The children 
classified as learning disabled (LD) met three criteria: 
(a) average to above-average intelligence (Wechsler ` 
Intelligence Scale for Children IQ of 90 or higher); (b) 
no obvious or severe motor, sensory, and emotional 
impairment (according to school records and personnel); 
and (c) a significant discrepancy between actual and 
expected reading achievement. The latter was deter- 
mined by comparing a child’s reading comprehension 
score on the Gates-MacGinitie Reading Tests (Gates 
& MacGinitie, 1965) with an “expected” score derived 
from the following formula: (grade level + 1) X IQ. A 
significant discrepancy was defined as a deficiency in 
reading comprehension of at least 1.0, .75, or .66 years 
for children in Grades 4, 3, and 2, respectively (Bond & 
Tinker, 1973). Children in a normal control group (NC) 
differed only with respect to the last criterion; their 
expected achievement did not exceed actual achieve- 
ment. 

The LD children were also classified as to the sensory 
modality of their learning deficit as determined by 
scores on the Visual Memory, Visual Closure, Auditory 
Memory, and Auditory Closure subtests of the Illinois 
Test of Psycholinguistic Abilities (ITPA; Kirk, 
McCarthy, & Kirk, 1968). A deficit was defined as 
when one modality score (e.g., the composite visual 
subtest score) was at least one standard deviation below 
the mean standard score for that child's age group, and 
the other modality (e.g., the composite auditory subtest 
score) was equal to or higher than the standard age 
score. The children with visual information processing 
deficits were in one group (visual learning disabilities, 
or VLD), while the children with auditory information 
processing deficits were in the other group (auditory 
learning disabilities, or ALD). 

Sixty-five LD children were screened to obtain the 
26 subjects who met the modality criteria (all children 
who qualified were included). Screening data for these 
subjects is summarized in Table 1. Thirteen children 
(7 boys and 6 girls) had an auditory learning disability 
(ALD) and 13 (7 boys and 6 girls) were visually disabled 
(VLD). The children ranged in age from 7 years 3 
months to 10 years 0 months (mean chronological age 
= 8 years 6 months). Their IQ scores ranged from 94 
to 135 (mean IQ = 108). 

The control group was comprised of 17 children (10 
boys and 7 girls) selected from the same grades and 
classrooms as the children in the two LD groups. None 
of these children had a lower-than-expected auditory 
or visual modality score (see Table 1). The age range 
of these children was 7 years 7 months to 9 years 9 
months (mean chronological age — 8 years 7 months, 
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Table 1 d 
Mean Screening Data Summarized for Groups 
Subject group 
Variable ALD VLD NC 

ITPA composite 

auditory score 28.19 38.31 38.85 
ITPA composite 

visual score 38.65 29.12 39.47 
Reading 

discrepancy —126yr. —1.15yr. +1.14 yr. 
WISC IQ 109 106 111 


Chronological age __8 yr. 6 mo. 8 yr. 5 mo. 8 yr 7 mo. 


Note. For subject groups, ALD = auditory learning disabili- 
ties, VLD = visual learning disabilities, and NC = normal 
control. For tests, ITPA = Illinois Test of Psycholinguistic 
Abilities and WISC = Wechsler Intelligence Scale for Children. 
The mean standard ITPA composite score is 36. 


their IQ scores ranging from 94 to 129 (mean IQ = 
111). 


Design 


The experimental design was a 3 X 2 X 2 factorial, 
with one between-subjects factor (three subject group 
classifications: ALD, VLD, and NC) and two within- 
subjects factors (two task modalities and two distractor 
modalities), which yielded four task conditions: (a) 
auditory task with an auditory distractor (AA), (b) au- 
ditory task with a visual distractor (AV), (c) visual task 
with an auditory distractor (VA), and (d) visual task 
with a visual distractor (VV). The dependent variables 
were error frequency and response latency. 


Tasks 


Task selection was a particularly critical aspect of the 
experiment. In order to minimize the potential con- 
founding of task difficulty and subject classification, it 
was necessary to employ a task that could be performed 
with similar success by the LD and NC children. The 
task selected was the continuous recognition memory 
task originally described by Shephard and Teghtsoon- 
ian (1961). In this task, stimulus items are presented 
one at a time to the subject, who indicates whether each 
item is “old” (one presented before), or “new” (not 
presented before). In addition to its simplicity, this 
task offered the advantage that stimuli could be pre- 
sented visually or auditorily with little change in format. 
Visual and auditory versions therefore could be equated 
to a high degree (Goldstein & Chance, 1974). 

The stimuli selected for the visual tasks were pictures 
of animals and plants, rear-projected onto the bottom 
half of a 51 X 76 cm viewing screen. The auditory task 
items were familiar nouns, tape recorded and played 
through a speaker concealed just under the viewing 
screen. There were 70 trials in each task, with 30 items 
presented twice, and 10 items occurring once. The first 
10 items were new. Items 11 through 70 were selected 
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so that the probability of an item being either old or new + 
was .5. A maximum of 19 and a minimum of 10 items 
intervened between a new and old occurrence of an 
item. 

Either an auditory or a visual distractor was pre- 
sented with each task. The visual distractors were Walt 
Disney color cartoons, rear-projected onto the top half 
of the viewing screen. The auditory distractors were 
portions of a children’s story (“Alice in Wonderland”), 
tape recorded at faster-than-normal speed (from 33 
revolutions/min to 45 revolutions/min) and played 
through the same speaker as were the auditory task 
items. On the basis of pilot investigation, the dimen- 
sions of the cartoon image and the loudness level of the 
story were set at selected values (20 X 20 cm and 60 dB, : 
respectively) so that comparable error frequencies were 
produced. There was no distraction on the first 20 
trials. Thereafter, distraction occurred with a proba- 
bility of .8 (with an equal frequency for new and old 
items). The distractor was presented from 1 to 6 sec 
prior to the onset of a stimulus item. 


ar 


Procedure l 


Each subject was tested individually in a soundproof 
room inside a mobile laboratory. In an initial session, k 
the screening procedures were administered. The four 
treatment conditions were administered subsequently, 
each on a different day. Each child was assigned to one 
of four task presentation orders: (a) AA, VV, AV, and 
VA; (b) VV, AA, VA, and AV; (c) AV, VA, AA, and VV; 
and (d) VA, AV, VV, and AA. | 

For each task, the child was seated at a desk facing the 
viewing screen, and the appropriate instructions were 
presented. In general, these informed the child that à 
series of words or pictures would be presented one ata 
time, and that the child must decide whether each item | 
was new (a “no” item) or old (a “yes” item). The child f 
pressed a key labeled "YES" or “NO” to indicate his ot | 
her answer. In addition, the child was told that no 
feedback would be given and that some extra pictures 
or sounds would occur sometimes. At this point, the 
child was specifically instructed to regard such events 


as irrelevant and attend only to the task stimuli. Ten 
practice trials (with no distractors) were administere' 
to ensure that the child understood the directions. The 
time required to administer each task was approxi- 
mately 15 minutes. 


Results 


_ Preliminary analyses of the subject groups | 
indicated that they were comparable in bo 
age, F(2, 40) = .30, p > .05, and IQ, F (2, 40) 
= .99, p > .05. In addition, the reading 
comprehension deficits of the two LD groups 
were not significantly different. Analysis 0 
sex differences for no-distractor and dis- 
tractor trials indicated that the sex-of- 
subject effects were not significant- 
Therefore, this factor was not included in the 
analyses reported below. 


Table 2 
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The two dependent variables, number of 


A E errors and response latencies, were analysed 

TB ex EM 3 separately using repeated measures analyses 
mp 3 8 ea) g of variance (ANOVAs). The first analyses 

8 3 (three subject groups by two tasks) were for 

" i^ the first 20 trials on which no distractors 

i^ bas ME were presented. The group means are in 
3 BE E äl EF Table2. Neither the groups main effect nor 
"l8 E the Groups X Task interaction was signifi- 
a 5 «| es sel? cant for either dependent measure, indicat- 
Slee} oda a |< ing that children in the three groups per- 
E Pe formed similarly on no-distractor trials. 

5 gie ama] pet A Aras 
Ez BS a| $2 Both task main effects were significant: for 
3 "35 errors, F(1, 40) = 5.08, p < .05; for latency, 
- Ln nil 83 F(1, 40) = 237.64, p <.05. There were more 
i| 8° A ii errors, and latencies were longer for the vi- 

Sa sual tasks. 

2 ae A 3 X 2 X 2 repeated measures ANOVA 
=| xs agl éz (Subject Groups X Task Modality X Dis- 
<3) 44 ar E tractor Modality) was computed on the raw 

8 EE error data from distractor trials. The 

‘ne means are shown in Table 3. This analysis 

E z se xa E yielded significant main effects for groups, 
‘Bis les d Ie r- o F(2, 40) = 16 04, p < .01, and distractors, 
als nt udi Ee F(1,40) = 25.05, p < .01. Both LD groups 
d FS) ee aa $3 made more errors than did the NC group (p 
ie H: < .01), but they did not differ from each 

S - OR cu as other (all intergroup comparisons were 
Bee per PUEDE computed using the Newman-Keuls proce- 

É EE dure). The children in each group made 
S| $8 2B) £s more errors with the visual distractors (mean 

RYE? " |s85 = 6.90) than with the auditory distractors 

i ii (mean = 5.20). 

5 *8 The Groups X Tasks interaction also was 
as| $8 ga) is significant, F(2, 40) = 7.24, p < .01. Chil- 

8 DE: dren in the ALD group made more errors on 

Ez the auditory tasks (p < .01), and the NC 

cn wal B oup made more errors on the visual tasks 

dB 88. $5| 53 (p « 01). Finally, the Tasks x Distractors 

E ag interaction was significant, F(1, 40) = 33.87, 

Ó ER zs eal 22 p <.01, the most errors occurring when the 

Z8" i no) 33 distractor and sus were a the same 
E Be modality (AA and VV; p < .01). 

Sip ees ¢ 855105 a 

2 on aa £ 8 1 Initially, the distractor trials error data were to be 

3 Ed TA i E considered in terms of a signal detection analysis. 

ae However, the calculation of d’ for individual subjects 

i E was not practical, since many children made no errors 

P Bud or had perfect hit rates. A comparison of error-type 

" E a proportions between the three subject groups indicated 

8 g >S Se that the ratios of false alarms to missed recognitions 

5 S| g3 a ga5 5 BS were not significantly different, "Thus, the group dif- 

[5] 4 g E - E 5 ferences in overall error frequency were assumed to 

[5] nel 23 reflect changes in item detectability. 


Means and Standard Deviations for Errors and Raw Latencies for No-Distractor Trials 
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In order to investigate the effect of dis- à 


> 
$ tractors over trials, a 3 X 2 X 4 repeated 
SEE SETAE E measures ANOVA (Groups X Distractors X 
25|; 22 EA) s 4 Blocks of 10 Trials) was computed. Asin 
8 E the previous analysis, the main effects for 
E groups and for distractors still were signifi- | 
2 >| 22 929 = cant. The main effect for trials also was 
h| 2 iB: significant, F(3, 120) = 20.56, p < .01. In 
als E general, errors declined over trials. 
ais] 88 83 1 The Distractors X Trials interaction also bs 
E E $ was significant, F(3, 120) = 16.98, p < .01, 
| eg val £8 and the data indicated that error patterns for ™ 
E e Y pe: pee me the two distractors were different. Withan | 
E 32 auditory distractor, errors increased in the l 
E Sg $5| 23 second block of trials, then decreased. For 
Sal E2 a visual distractor, the error mean in the first 
E E E moy of trials was highest, declining there- 
8 55 after. 
c ES 3 ki 88 “4 The interaction of groups by trials was not | 
2 8 : significant in the analysis. However, when 
E Ba the data for the two LD groups were com- |, 
$ E "T EH bined, a significant Groups X Trials inter- e 
B 3. E E pd Se) EE action was obtained, F(3, 123) = 3.44, p € 
ki 3 s EE .05. The means for this interaction are 
A) als <| 88 gu oe shown in Table 4. Errors decreased signif- 
S | 8 vai E $ Tip, between Trial Blocks 2, 3, and 4 for 
$| JẸ (dá e group. For the LD group, errors | 
à| [8 E| 88 83 3 È decreased significantly only in the last block 
v È zi of trials (p < .01). Error reduction appar- 
4 sz Ex| $5 ently began later for the LD children, whose 
S aos E A performance did not reach as high a level as h 
B 25 the normal control children. 
S H ii Latency data were represented by a 
3 RS z à Rl Fs change score derived by subtracting a child’s 
3 8 ES mean latency for correct no-distractor trials 
E pu from the mean latency for correct distractor 
g à T ENS EA trials. A 3 X 2 X 2 repeated measures 
El gl. E| $8 Sud ed ANOVA (Groups X Tasks X Distractors) 
n AE LŠ was performed on this data, yielding only 
a] 2s] 83 A833 
S iS £2 
3 E RE 
3| |z| 39 88 ER Table 4 P 
3 $ CUL ES Mean Errors and Standard Deviations for 
BEJE PREAS Groups Trials Interaction MI i T 
S ps eg aa} la : 
3 xem E Subject Trial block È 
5 #3 group 1 2 3 4 
ES yS 25 $ Normal control 
es & 5 $3 az M 285 250 182 56 
$8 EB esa SSES i Bd S 
a8 aS| S Ez sË Learning disabled 
aS Si avi? ti $3 ud $87 410 354 280 | 


SD 119 143 115 102. $ 
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““ significant main effects for tasks, F(1, 40) = 


21.05, p < .01, and distractors, F(1, 40) = 
11.72, p < .01. Thus, the greatest change 
occurred with the AV treatment condition, 
while the smallest change occurred in the VA 
condition. 

Finally, correlations were computed be- 
tween error frequency and the latency 
change score. The resulting correlation was 
not significant for the NC group (r = —.01, 
p > .05). On the other hand, errors and la- 


»=4 tency change were related significantly for 


the two LD groups. The relationship was 
positive for the VLD group (r = .62, p < .05) 
and negative for the ALD group (r = —.48, p 
< .05). Apparently, the errors made by 
children in the three groups were committed 
under different circumstances. 

In summary, then, children in the LD 
groups made more errors with distractors 
than did children in the NC group. In gen- 
eral, these errors decreased over trials. 
However, the rate of improvement was less 
pronounced for the learning disabled chil- 
dren. For all groups, more errors were made 
with visual distractors. At the same time, 
the number of errors associated with either 
distractor was greater for the intramodal 
treatment condition (AA or VV). Latency 
change score data did not parallel the error 
data, the two apparently being related in 
different ways for the two LD groups. 


Discussion 


The present study was relevant to two 
areas of concern: the general effect of dis- 
tractors among learning disabled children 
and the specific effects of distractors on 
different types of LD children. The data 
reported above support the general view that 
distractibility occurs with greater frequency 
among learning disabled children. Dis- 
tractors clearly interfered more with the 
recognition task performance of the learning 
disabled children, who made nearly twice as 
many errors on distractor trials as did the 
normal children. Furthermore, while there 
was improvement over trials for all groups, 
the normal control children improved 
sooner, attaining nearly perfect performance 
» by the end of the task. In contrast, the im- 
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provement for the learning disabled children 
was much less pronounced and started 
later. 

The findings were less clear with respect 
to the main hypothesis of the study. As 
predicted by the modality deficit model of 
learning disability, the greatest distractor 
effects should have occurred in the two LD 
groups whose tasks included the deficient 
modality. The failure to obtain significant 
interactions between subject classification 
and distractor or treatment conditions did 
not support this hypothesis. On the other 
hand, the data in many respects were con- 
sistent with the modality distinctions. For 
example, the total errors and improvement 
over trials differed with the two distractors, 
and more errors occurred with each dis- 
tractor in the intramodal condition. Thus, 
the tasks and distractors were different, and 
the differences corresponded to auditory- 
visual procedural differences. Additionally, 
the mean errors for each task were ordered 
as predicted for the two LD groups. The 
children in the ALD group made more errors 
on the auditory tasks, and the VLD group 
had more errors (though not significantly 
more) for the visual tasks. Ostensibly, this 
would suggest that the modality-based 
classification of subjects was valid. How- 
ever, this conclusion was complicated by the 
fact that the control children also made more 
errors on the visual tasks. Apparently, the 
visual tasks were more difficult generally. If 
this is the case, it is reasonable to assume 
that, had the tasks been of equal difficulty, 
the task effect would be magnified for the 
ALD group but reduced with the VLD chil- 
dren. Ineffect, only the performance of the 
ALD group would be consistent with the 
model. Nevertheless, the two LD groups 
were different with respect to their relative 
performance on the two tasks. The question 
is, What factor(s) might account for the 
differences? 

One possibility is that the VLD children 
were initially misclassified and, in reality, 
were not different from the control children 
in terms of perceptual skills. Yet, this was 
unlikely, since the correlations between error 
rate and latency change for the three groups 
(positive for VLD, negative for ALD, and 
zero for NC) differed. There were qualita- 
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tive differences in performance among the 
groups. 4 

Another possibility is that the hypothesis 
of separate auditory and visual information 
processing channels is inadequate. Indeed, 
efforts to prescribe curricula on such a basis 
have been largely unsuccessful (Arter & 
Jenkins, 1977). A more accurate con- 
ceptualization of modality might be based 
upon the notion of attribute processing ca- 
pability. Thus, the two groups of learning 
disabled children in this study might have 
differed in their ability to deal with *tem- 
poral" as opposed to "spatial" attributes 
(Guyer & Friedman, 1975). A careful ex- 
amination of the ITPA subtests used here 
suggests this might be a reasonable possi- 
bility. That is, the items for both auditory 
subtests are administered sequentially, while 
items in the visual subtests actually are 
presented as entire spatial units. If this is 
the case, then neither the normal or the VLD 
children had a temporal deficit. The rec- 
ognition memory task more closely resem- 
bles a temporal task (to the extent it can be 
categorized in a temporal-spatial dimen- 
sion). Thus, only the ALD children were 
different with respect to the present exper- 
imental task, and their performance re- 
flected that difference. "This possibility 
should be examined further by covarying 
visual-auditory and  temporal-spatial 
tasks. 

One final point merits attention. From a 
clinical perspective, adequately defining the 
nature of distractibility is just as important 
as the accurate prediction of distractor ef- 
fects. Distractibility has often been viewed 
as an involuntary behavior, to be treated via 
isolation and/or medication. Alternatively, 
distractibility can be conceptualized as a 
cognitive deficit, for example, the absence of 
or failure to utilize appropriate attentional 
strategies (Torgeson, 1977). Several aspects 
of this study were consistent with the latter 
view. For one thing, the finding that per- 
formance improved over trials for all groups 
indicates that learning occurred. Second, 
each child was questioned informally about 
the distractors after each test session. Even 
the children who performed with the fewest 
errors were able to describe the distractors 
in considerable detail, indicating that suc- 
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cessful performance did not require the child 
to ignore the distractors. The better per- 
formance of the NC group during distraction 
may have resulted because these children 
had more quickly or more effectively 
adopted a response strategy that enabled 
them to attend to both the task items and to 
the distractors. Further research should 
attempt to clarify the role of attentional 
strategies on learning disabled children’s 
learning and memory. 
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Verbatim and Paraphrased Adjunct Questions and 
Learning From Prose 


Thomas Andre and Sandra Womack 
Iowa State University 


lege students read passages and answered either verbatim or paraphrased 
E questions either inserted in the text or massed at the end of the pas- 
sage. The students either were or were not permitted to review the passage to 
answer the questions. On the later posttest containing unfamiliar para- 
phrased questions, students given inserted paraphrased adjunct questions 
outperformed the other students. The results were congruent with a rationale 
that holds that under appropriate conditions, paraphrased questions influ- 
ence the state of encoding of items of information presented in a passage. 


Current ideas about levels of processing 
seem to support the educator's intuitive 
beliefs in the value of higher-order questions. 
The basic levels-of-processing idea is that 
information may be processed to a greater or 
lesser depth; the greater the depth of pro- 
cessing, the better the retention (Anderson, 
1970; Craik & Lockhart, 1972). Anderson 
(1970) argued that readers may encode text 
at orthographic/phonological levels or at a 
deeper semantic level; semantic encoding 
would lead to better retention. Anderson 
(1972) suggested that the degree of semantic 
encoding could be assessed by using para- 
phrased questions that contained no sub- 
stantive words in common with the text and 
that the use of paraphrased adjunct ques- 
tions could increase readers’ semantic en- 
coding. Anderson and Biddle (1975) tested 
this latter hypothesis in four experiments 
but failed to find confirmation. 

Andre and Sola ( 1976) argued that para- 
phrased questions would not facilitate per- 
formance unless the readers could use the 
questions to guide their encoding of partic- 
ular items of information. Anderson and 
Biddle's procedures prevented that guidance 
function. Using a Sentence list learning 
paradigm, Andre and Sola (1976) demon- 
strated a facilitative effect of paraphrased 

questions. The purpose of the present study 
was to determine under what conditions 
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paraphrased questions would facilitate 
learning from prose. 

Applied to the problem of adjunct ques- 
tions in prose, Andre and Sola's reasoning 
suggests that the effect of the type of ques- 
tion would be moderated by variables such 
as the amount of material intervening be- 
tween text and question and whether readers 
could review the text in order to answer. 
The argument would be as follows. When 
students read, they attempt to semantically 
encode the message, but for numerous rea- 
Sons (memory limitations, time pressures, 
etc.) semantic encoding is not always 
achieved. Accordingly, after a reading, à 
particular item of information may exist in 
one of three states in a reader's memory. It 
may be: (a) not encoded at all, (b) less than 
semantically encoded (orthographically or 
phonologically encoded), or (c) semantically 
encoded. Paraphrased questions particu- 
larly influence retention by influencing the 
state of encoding of particular items of in- 
formation. Onlyif material is not originally 
semantically encoded will paraphrased 
questions improve performance over ver- 
batim questions. Whether paraphrased 
questions will benefit performance vis-à-vis 
verbatim questions for less-than-semanti- 
cally encoded material depends upon factors 
Such as whether review is possible and the 


. | This discussion ignores any effect of adjunct ques- 
tions except for the effect on the level of encoding of the 
ce. We assume that paraphrased and ver- 
Would have similar effects on other 
influence retention. 
Association, Inc. 0022-0663/78/7005.0796500.75 
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amount of intervening material. An analysis 
of the effect of paraphrased and verbatim 
questions under different combinations of 
these factors is given below. 

No original encoding, no review. If the 
item of information is not encoded at all and 
the reader is not permitted to review, adjunct 
questions will have no effect on the level of 
encoding. The student can answer neither 
a verbatim nor a paraphrased question, and 
the act of not answering will not increase the 


"encoding state. 


Less than semantic original encoding, no 
review. Adjunct questions might influence 
the state of encoding in this situation. 
Whether such influence will occur depends 
upon whether the student can match the 
question with his memory trace of the par- 
ticular item of information. Because or- 
thographic and phonological features of a 
verbatim question overlap with features of 


“the less-than-semantic memory trace, the 
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probability of a match (Pmatch) between text 
and question will be greater with a verbatim 
than a paraphrased adjunct question. 
However, even with a paraphrased question 
some probability of a match exists, because 
the student can covertly recode the question 
into a verbatim form and find a match. 

Given that a match occurs with either type 
of question, there will be some probability 
that answering the question will lead to a 
recoding of the memory trace to a semantic 
level (P semantic recoding or Psr). The Psp 
for a paraphrased question will always be 
higher than Psr for a verbatim question 
because the paraphrased question forces 
students to process meaning, whereas the 
verbatim question does not. 

Because Pgg for paraphrased questions is 
always greater than Psp for verbatim ques- 
tions, the overall effect of verbatim or para- 
phrased questions will be determined by the 
size of the difference between Pmatch for 
verbatim (Pmatch-v) and Pyatch for para- 
phrased (Pmatch-p) items. One variable af- 
fecting Pmatch-v and Pmatch-p is the amount 
of material intervening between the text and 
the question. When the question quickly 
follows the item of information in the text, 
Prnatch-p Will be only slightly smaller than 
Pratch-v, and thus paraphrased adjunct 
questions will have a beneficial effect on 
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overall posttest performance. As increasing 
amounts of material intervene between text 
and adjunct question, Pmatch-p Will decrease 
more quickly than Pmatch-v- Thus, with 
considerable intervening material, para- 
phrased questions will not produce a bene- 
ficial effect on later performance. 

Less than semantic original encoding or 
no encoding, with review of text. The op- 
portunity to reread the text will cause Pmatch 
to be quite high for both verbatim and par- 
aphrased items. This will be true regardless 
of the amount of intervening material. The 
reader should be able to locate the relevant 
portion of the text and answer the question. 
Because Pmatch-v and Pmatch-p Will both be 
high, paraphrased adjunct questions should 
facilitate overall posttest performance. 

This analysis leads to the following pre- 
dictions. When no review is permitted, 
paraphrased adjunct questions should fa- 
cilitate performance if they are inserted in 
the text but not if they are massed at the end 
of a text. When review is permitted, para- 
phrased adjunct questions should lead to 
superior performance regardless of position 
of the question. These predictions are valid 
only when a posttest consists of paraphrased 
versions of the items used as adjunct items 
and these paraphrased versions have not 
previously been seen by any of the subjects. 
If subjects have seen the posttest items 
previously, then their performance will be 
influenced by their previous response to the 
item. The present experiment was designed 
to test these predictions. 


Method 
Subjects 


Male and female undergraduate volunteers (N = 210) 
from psychology classes participated in the study and 
received extra course credit for their participation. 


Design 


The design of the primary experiment can be repre- 
sented by a 2 X 2 X 2 between-subjects factorial. 
Subjects either received verbatim or paraphrased ad- 
junct questions (type of question); the adjunct questions 
were either inserted in the text after the page that 
provided their answer or massed at the end of the pas- 
sage (question position); and the students either were 
or were not permitted to review the passage to find the 
answer to the question (review possible). Three addi- 
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tional control groups were conducted: The first read 
the passage without inserted questions and immediately 
took the posttest, the second took the posttest without 
reading the passage, and the third was asked to answer 
the posttest items by looking up the answers in the 
passage. The data from the primary experiment were 
analyzed using a2 X 2 X 2 factorial analysis of variance; 
comparisons with the control groups were made using 
Dunett's test. 


Materials 


The materials consisted of an approximately 

5,000-word passage dealing with problems in interna- 
tional communication. 'The passage was segmented 
into 12 parts; each segment began and ended with a 
complete paragraph; and segments ranged from about 
275 to 600 words in length (M = 420). Each segment 
was reproduced on a single 8% X 11 inch (21.6 X 27.9 
cm.) page. A target sentence was selected for each 
segment, For each target sentence, three versions of 
the same short-answer question were written. The 
verbatim version used, insofar as possible, the words in 
the original text to form the question. Each of two 
paraphrased versions were written so as to contain, in- 
sofar as possible, no lexical overlap with each other, the 
verbatim version, or the original text. For example, if 
the target sentence stated, “connotation refers to the 
affective aspect of a communication,” the verbatim 
question might ask, “What refers to the affective aspect 
of a communication?”; whereas one paraphrased version 
might ask, “Which word means the emotional part of 
a message?” and the other paraphrased version might 
be, “The feeling component of a sentence is called its 
—? One of the paraphrased versions and the 
verbatim Version were selected for use as adjunct 
questions. To counterbalance possible effects of 
quality of paraphrase, the first written paraphrased 
roe was selected as the adjunct question for half of 
the segments and the second written version for the 
remaining segments. For the inserted-question con- 
ditions the adjunct questions were interspersed between 
the appropriate Segments. For the massed conditions 
ba were — at the is of the booklet. 

_ ie remaining paraphrased version was i 

with two other questions written for each eiim 


"Thus, a 36-item posttest was constructed i 
12 paraphrased items, 12 near items and 12far items, 


The order of items within the posttest was random. 


Procedure 


The experiment was conducted in a large 
^ e classi 
in the evening, Students were first given 4 min inca 
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a short passage unrelated to the communication passage * 
and 6 min to take a 24-item posttest on it. The purpose 
was to assess the reading ability of the groups. Fol. 
lowing the reading test, booklets for the various condi- 
tions were distributed in an unsystematic manner in 
order to randomly assign subjects to conditions. Stu- 
dents were told to read their cover directions, to follow 
only those directions, and to ignore what other students 
were doing. When they finished reading their booklets, 
students recorded the number currently showing on a 
digital clocklike device at the front of the room. The 
device displayed 4 in. (10.2 cm) high numerals starting 
with 1 and incremented the displayed number by one ` 
every 10sec. Students worked at their own rate. When 
they were done, the experimenter brought them the 
posttest and removed the booklet. Upon completing 
the posttest, the students were permitted to leave. All 
students were finished within 1.5 hr. 


W 


Results 
Reading Test 


This test was designed to assess the | 
equivalence of the groups. Two measures ^ 
were analyzed: the number of correct re- 
sponses on the posttest and the word-per- 
minute (wpm) reading rate. Because the 
conditions contained unequal numbers of 
subjects, a least squares solution to analysis 
of variance was employed in this and sub- 
sequent analyses. Neither analysis yielded | 
significant effects. The mean number cor- 
rect was 12.4 and the mean reading rate was 
309.8 wpm. 


Adjunct Questions 


A2X2X2 analysis of variance was con- 
ducted on the number of correct responses 
to the adjunct questions. As might be ex- 
pected, there were significant main effects 
for all three factors. Students permitted to 
review the text (M — 9.3) did better than 
students not permitted to review (M = 43). 
Performance was better when the questions 
Were inserted in the text (M = 7.5) than 
when they were massed at the end (M = 59), 
and students given verbatim questions (M 
= 7.3) did better than students given para- 
phrased questions (M = 6.2). The respec- 
tive Fs(1, 158) for each comparison were 
268.7, 28.5, 19.9, p < 01, MS, = 3.94. 

1 The Review Possible x "Type of Question 
interaction was significant, F(1, 158) = 17.1, 
P < OL. The means were as follows: re- 
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== Table 1 


Mean Number of Correct Responses on 


Paraphrased Posttest Questions 


Type of adjunct 
question 
Para- 


Group Verbatim phrased 


Inserted questions 


Review 4.79 (19) 5.79 (19) 
No review 3.09 (22) 4.38 (21) 
M 3.87 5.05 
Massed questions 
Review 4.44 (18) 4.25 (24) 
No review 2.59 (22) 2.19 (21) 
M 3.42 3.28 
Control 
Retention test only .81 (12) 
Passage and retention test 2.92 (21) 
Passage and retention test 
with review 5.18 (10) 


Note. Numbers in parentheses represent ns for each group. 


view-verbatim = 10.8, review-paraphrased 
= 71.9, no review-verbatim = 4.4, and no re- 
view-paraphrased = 4.3. Basically, the 
difference between review and no review was 
greater for verbatim than for paraphrased 
questions. Alsosignificant was the Question 
Position X Type of Question interaction, 
F(1,158) = 14.0, p < .01. The means were 
as follows: inserted-verbatim = 10.8, in- 
serted-paraphrased = 7.5, massed-verbatim 
= 7.0, massed-paraphrased = 4.9. No other 
interactions were significant. 


Paraphrased Posttest Questions 


Table 1 presents the mean number of 
correct responses for each condition. There 
were significant effects for only review pos- 
sible, F(1, 158) = 23.2, p € .01; question po- 
sition, F(1, 158) = 9.86, p « .01; and the 
Question Position X Type of Question in- 
teraction, F(1, 158) = 3.92, p < .046 (MS, = 
5.46 in all cases). Performance was better 
when review was permitted (M = 4.79) than 
when it was not (M = 3.06), and students 
performed better with inserted adjunct 
questions (M = 4.46) than with massed ad- 
junct questions (M = 3.35). The Question 
Position X Type of Question interaction is 
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most interesting and is shown in the mar- 
ginal means in Table 1. When questions 
were massed at the end of the passage, there 
was little difference between verbatim and 
paraphrased questions; in fact, verbatim 
questions were slightly superior to para- 
phrased questions. However, when ques- 
tions were inserted in the text, paraphrased 
questions substantially improved perfor- 
mance. 

Comparison of the experimental condi- 
tions with the control conditions yielded 
interesting results. Dunnett's test was used 
for all comparisons. Because the number of 
comparisons is large we do not report each 
value, but instead summarize the results. In 
comparison with the control group that 
merely took the retention test without 
reading the passage, all experimental groups 
except the two massed-no review conditions 
did significantly better. In comparison with 
the control condition that merely read the 
passage, none of the experimental groups did 
significantly better except for the inserted- 
review-paraphrased condition. The means 
of the two inserted—paraphrased conditions 
also differed significantly from this control. 
In comparison with the condition given the 
passage and the retention test and permitted 
to review the passage to answer the retention 
test questions, only the two massed-no re- 
view conditions did significantly more 
poorly. 

In order to further explore the effect of 
type of question it seemed worthwhile to 
examine performance on the posttest as a 
function of performance on the adjunct 
questions. Accordingly, the arc sine trans- 
formed proportion of correct paraphrased 
items to correct adjunct items was computed 
and subjected to a 2 X 2 X 2 analysis of 
variance. The results indicated only a sig- 
nificant main effect for type of question, F(1, 
146) = 6.21, p < .013, MS, = .068. Para- 
phrased adjunct questions led to better 
performance (M = .89) than did verbatim 
adjunct questions (M = .79). 


Near Posttest Questions 


The only significant source of variance on 
the analysis of the near questions was 
Question Position, F(1, 158) = 4.13, p < .041, 
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MS, = 2.92. Performance was better with 
inserted questions (M = 3.36) than with 
massed questions (M = 2.85). 


Far Posttest Questions 


No sources of variance were significant in 
the analysis of far questions. The overall 
mean number correct was 3.61, MS, = 
4.82. 


Time 

Also analyzed was the amount of time 
subjects took to complete the passage. Only 
the interaction of Review X Position of 
Question proved significant, F(1, 158) — 
4.545, p < .01, MS, = 2727.28. The means 
were the following: no review-inserted = 
29.7 minutes; no review-massed = 28.7 
minutes; review-inserted = 27.0 minutes; 
review-massed = 32.8 minutes. 


Discussion 


The results of this study generally support 
the rationale described in the introduction 
and provide useful information as to when 
paraphrased adjunct questions will facilitate 
meaningful learning from prose. According 
to our rationale, when review is not permit- 
ted, paraphrased questions should facilitate 
performance on a paraphrased posttest only 
when inserted close to the relevant text in- 
formation. This prediction was confirmed. 
Our rationale also suggested that, when re- 
view is permitted, paraphrased questions 
should facilitate performance both when 
inserted and when massed at the end of the 
passage. This prediction was not confirmed. 
With review, inserted paraphrased questions 
facilitated performance, but massed para- 
Pied a did not. 
ittle hindsight Suggests an explanation, 
We had expected that with NUN RS 
would have no difficulty finding the relevant. 
sentence and answering either verbatim or 
paraphrased adjunct questions. In other 
words, Pratch Would be close to 1. However, 
it was probably the case that finding the 
relevant sentence was easier when the 
questions were inserted in the passage than 
when they were not. Readers who had 
massed questions were faced with the task 
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of finding 12 specific sentences embedded in 
a lengthy passage. Thus we suspect thai 
when questions were massed and review was 
permitted, readers simply could not locate 
many of the target sentences and hence 
could not answer some adjunct questions 
As we stated earlier, an unanswered adjunct 
question will not facilitate performance, 
This interpretation is supported by the 
performance of the third control group; 
which was given the passage and the posttes 
and asked to find answers. Performance in %4 
this group averaged about 40% correct, ai 
testing to the difficulty of the search task, 
Presumably, had we made it easier for 
subjects in the massed conditions to find the 
relevant target sentences, the expected in- 
teraction of question position and review 
would have occurred. 

The results also are consistent with our 
arguments about Pa, and Psr. The 
reader will recall that Pmatch refers to the} 
probability of a match between the adjunct 
question with the memory trace of the rele: 
vant information, and Psp is the probability 
of semantic recoding given that a match oc: 
curs. Psr for verbatim items should be 
lower than Pgp for paraphrased items. Ifa 
reader correctly answers an adjunct ques- 
tion, a match has presumably occurred 
(Pmatch = 1). If posttest performance is 
examined as a function of adjunct question 


items. Consistent with our rationale, only 
the effect 
in this analysis, and paraphrased adjunct 
questions led to improved performance. 
The qu 
questions exert their effect, through either a 
forward Strategy effect or backward review 
effect can be raised (McConkie, Rayner, & 
Wilson, 1973; McGaw & Grotelueschen; 
1972). A forward Strategy effect would 
occur if Type of Question influenced 
subjects’ subsequent reading strategy: 
Presumably, subjects receiving paraphrase 
questions would use a reading. for-meaning 
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strategy, whereas subjects receiving verba- 
tim questions would use a rote processing 
strategy. A backward review effect would 
occur if the effect of the question was to in- 
fluence the specific memory trace of the item 
of information asked about. The data seem 
most congruent with a backward effect. If 
type of question influenced reading strategy, 
then we would expect that type of question 
would influence performance on the inci- 
dental questions on the posttest; it did not. 
Also, changes in reading strategy should be 
reflected in changes in reading speed 
(McConkie, et al., 1973). Although reading 
rate was not measured directly, type of 
question did not influence time to complete 
the passage. Because the effect of type of 
question seemed specific to the material 
asked about, a backward effect is indicat- 
ed. 

The present results are clearly congruent 
with the depth or levels-of-processing no- 
tions suggested by Craik and Lockhart 
(1972) and Anderson (1970). However, they 
are also congruent with the recent transfer- 
appropriate-processing notion suggested by 
Morris, Bransford, and Franks (1977). This 
latter notion holds that students may process 
to-be-learned information in various ways. 
Some of these ways will be appropriate for 
some transfer tasks but not for others. 
Performance on the transfer task will be 
determined by its relationship to the pro- 
cessing strategies used in the learning task. 
This notion is similar to Mayer’s work on the 
relationship between learning strategies and 
near and far transfer (Mayer, 1975, 1977) 
and is related to the notion of encoding 
specificity (Thomson & Tulving, 1970). 
Under the transfer-appropriate-processing 
hypothesis, our paraphrased-adjunct- 
question subjects were led to process the 
meaning of the target sentences, thus a 
memory representation was set up that was 
appropriate for the paraphrased questions 
transfer task on the posttest. Students given 
verbatim adjunct questions processed non- 
meaningful aspects of the target sentences 
and hence did more poorly on the transfer 
task. Up to this point the transfer-appro- 
priate-processing and levels-of-processing 
notions are identical. However, the trans- 
fer-appropriate-processing hypothesis would 
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go further and predict that if we had used a 
transfer task in which nonmeaningful cues 
were important, the verbatim group would 
have done better than the paraphrased- 
adjunct-questions group. 

Obviously, the present experiment does 
not provide a way of separating the levels- 
of-processing and transfer-appropriate- 
processing explanations of the results. One 
way of separating these hypotheses might be 
to repeat the verbatim adjunct questions on 
the posttest. If the verbatim group out- 
performs the paraphrased group on these 
questions, the transfer-appropriate-pro- 
cessing hypothesis will be confirmed. 
Whereas the present study is mute on this 
issue, the Andre and Sola (1976) study did 
include a verbatim posttest. In that study, 
performance of the verbatim and para- 
phrased groups on the repeated verbatim 
questions did not differ significantly. 
Whereas this result might seem to support 
a levels-of-processing view, the interpreta- 
tion is not clear-cut. We have not asserted 
that readers given verbatim questions are 
specifically led to process orthographic/ 
phonological features, only that they can 
sometimes answer verbatim questions with- 
out processing meaning. Moreover, subjects 
given paraphrased questions have to process 
orthographic/phonological features of the 
text as well. Because both conditions in- 
volve some processing of orthographic/ 
phonological features, the use of repeated 
verbatim items on the posttest would not 
provide a clear test of the levels-of-process- 
ing versus transfer-appropriate-processing 
hypotheses. 

It is important to note that even students 
given verbatim adjunct questions answered 
a number of the paraphrased posttest 
questions correctly. We believe that 
subjects normally process instructional text 
for meaning: In other words, people try to 
understand what they read. When people 
do understand, probably either verbatim or 
paraphrased adjunct questions will have 
equal effects on later performance. In the 
context of the present study this means that 
if a subject understood a target sentence, 
received a verbatim adjunct question, and 
answered it on a meaningful basis, the 
question probably helped his or her later 
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performance as much as if he had answered 
a paraphrased question. We think that it is 
when the normal reading-to-understand 
process fails that paraphrased questions 
have their greatest effect. Paraphrased 
questions may have this effect in either of 
two ways. First, the subject may have a 
representation of the target sentence that is 
based upon orthographic or phonological 
cues. Given the paraphrased adjunct 
question, the subject may be able to match 
the adjunct question with the representation 
of the target sentence, then re-encode the 
target sentence on a meaningful basis. 
Second, if review is permitted, the subject 
may find the target sentence, reread it, and 
re-encode it meaningfully. In either case, 
the effect of paraphrased questions will be 
greatest when the normal reading-to-un- 
derstand process fails. 
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Effects of Motivation, Subject Activity, and Readability 
on the Retention of Prose Materials 


Warren Fass and Gary M. Schumacher 
Ohio University 


College students were asked to read a prose passage and to take an exam on the 
contents of the passage. Subjects received an easy or hard version of the pas- 
sage, were either allowed or not allowed to underline key phrases while read- 
ing, and were either motivated or not motivated by payment of money. Three 
key results were found: (a) Non-highly motivated subjects performed better 
on the easy version than on the hard version of the text; (b) underlining aided 
only highly motivated subjects; and (c) underlining aided the subjects who 
worked on the hard version of the text. Results were interpreted from moti- 


vational and activity viewpoints. 


Research on the production of readable 
writing typically changes the readability of 
a text to produce easier and/or harder ver- 


-sions of the text. One might predict that an 


easier (and thus more readable) version of a 
text would yield higher comprehension 
scores than a harder version; however, this 
is not always the case. After analyzing 36 
readability studies, Klare (1976) developed 
a model describing how various factors in- 
teract and affect the results of these stud- 
ies. 

One of the factors discussed by Klare 
(1976) was the motivational level of the 
subjects participating in the experiments. 
According to Klare, if a subject is highly 
motivated, either by payment of money, 
threat, or interest, the effects of changing the 
readability level of a text may not be found. 
For example, McLaughlin (1966) manipu- 
lated motivation through the presence or 
absence of threat. McLaughlin presented 
subjects with either easy or hard versions of 
a text that differed in format, text length, 
and sentence length. The results showed 
that there were no differences in compre- 
hension scores between the easy and hard 
versions of the text for the highly motivated 
subjects. However, the subjects who were 
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not highly motivated performed significantly 
better on the easy version. Similarly, Den- 
bow (1973) found that subjects who rated a 
passage lower in interest value (low moti- 
vation) showed higher gains attributed to 
improved readability than subjects who 
rated the passage higher in interest value 
(high motivation). 

Thus, it seems that if subjects are highly 

motivated, the effects of changing the read- 
ability level of a text are less likely to be 
found. One explanation for these findings 
is that subjects who are highly motivated 
compensate for the differences in readability 
by performing more or different types of 
cognitive activities than subjects who are not 
highly motivated, thereby eliminating the 
effects of readability differences. In con- 
trast, subjects who are not highly motivated 
might not perform the necessary cognitive 
activities and, consequently, may be in- 
fluenced more by the nature of the pas- 
sages. 
Activity explanations such as this have 
become popular since the publication of an 
article by Craik and Lockhart (1972). Ac- 
cording to these authors, a greater degree of 
cognitive analysis promotes deeper pro- 
cessing of the to-be-remembered material 
and hence results in better retention. 
Similar arguments have been offered by 
Meacham (1972) and Fass (1976). 

A number of recent articles have at- 
tempted to directly assess the importance of 
the types of activities subjects carry out for 
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their ability to retain the content of prose 
passages (Arkes, Schumacher, & Gardner, 
1976; Frase & Schwartz, 1975; Rickards & 
August, 1975). For example, Arkes, Schu- 
macher, and Gardner (1976) found that se- 
mantic tasks, as compared to nonsemantic 
tasks, resulted in better recall; they argued 
that improved recall of prose material is re- 
lated not only to the type of task carried out 
but also to the duration of the task interac- 
tion. Frase and Schwartz (1975) and Rick- 
ards and August (1975) found that if subjects 
generated their own activities (either by 
writing their own questions or by underlining 
their own phrases), they performed better on 
subsequent comprehension tests than 
subjects who received the experimenter’s 
generated activities (either postquestions or 
preunderlined words). 

These studies which manipulated activi- 
ties directly suggest that the type activity 
that subjects carry out does dramatically 
influence their ability to recall. If motiva- 
tional effects found in earlier studies were 
related at least in part to the subjects’ ac- 
tivities, motivational effects may be found 
to differ depending on other activities the 
subject is asked to perform, 

Thus, the type of activity may have to be 
taken into account if an activity interpreta- 
tion of the motivational differences is to 
work. Since subject-generated activities are 
more generalizable to classroom situations, 
more research is needed to better under. 
stand their effects, 

The purpose of the present study was to 
clarify the role of motivation and activity 
factors in the ability to recall information 
from passages of varying levels of complex- 
ity. College students were asked to read a 
prose passage and to take an exam on the 
contents of the passage. Subjects received 

either an easy or hard version of the passage, 
were either allowed or not allowed to un- 
derline key words or phrases while reading, 
and were either motivated or not motivated 
by payment of money. 

A number of particular interaction effects 
were hypothesized to occur. First, based on 
the work by Klare ( 1976), subjects who were 
not highly motivated were expected to per- 
form better on the easy version of the pas- 
sage than on the hard version; subjects who 
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were highly motivated were expected to < 
perform equally well on both versions, 
Second, Fass (1976) found that non-highly 
motivated subjects were influenced more by 
induced activities; therefore, we predicted 
that carrying out a specific, relevant, 
subject-generated activity would aid per- 
formance only for subjects who were not 
highly motivated. Third, since a hard pas- 
sage necessitates a greater amount of pro- 
cessing to be comprehended and retained, we 
expected that carrying out a specific, rele- * 
vant, subject-generated activity would aid 
performance more on a hard passage than on 
an easy one. 


Method 
Subjects 


Serving as subjects were 70 male and 90 female un- 
dergraduates. "They were volunteers who were given 
course credit for participation. In addition, half the M 
subjects had the opportunity to receive payment of 


money, depending upon their performance in the ex- 
periment. 


Stimulus Materials 


Two versions of a passage dealing with the principles 
of enzymes were used as the stimulus materials. One 
version, the easy version, contained 1,009 words, 65 
sentences, and had a mean sentence length of 15.5 
words. The easy version had a Flesch (1948) reading 
ease score of 61 (seventh-eighth-grade level). The hard 
version of the passage contained 1,057 words, 36 sen- 
tences, and had a mean sentence length of 29.3 words. 
This version had a Flesch (1948) reading ease score of 
31 (high School level). A 15-item, four-alternative 
multiple-choice test was generated to test the subjects" 
retention of the information contained in the pas- 
sages. 

Klare (1976) has argued that if subjects are more fa- 
miliar with the contents of a passage and/or more in- 
terested in a passage, the effects of the readability dif- 
ferences between Passages may not be found. There- 
fore, subjects’ judgments of interest in and familiarity 
with the content of the enzyme passage were obtaine! 
by using two 5-point rating scales. A rating score of 1 
meant that the passage was either very uninteresting 
or totally unfamiliar. A rating score of 5 meant that the 
Passage was either very interesting or very familiar. 
Subjects’ Scores on these two ratings were used as cO 
variates in an analysis of covariance to control for in- 
terest and familiarity levels, 


Design 


A2X 2x 2 factorial design, using an analysis of co- 
variance, was employed. The first factor was motiva- 


MOTIVATION, ACTIVITY, AND READABILITY 


“< tion (Highly Motivated X Not Highly Motivated), the 


a 


second factor was the readability level of the passage 
(Easy X Hard), and the final factor was the type of task 
(Underline X Read). 


Procedure 


‘Twenty subjects were randomly assigned to one of the 
readability conditions, with either the underline or 
read-only task, under conditions of either high or low 
motivation. The subjects participated in study groups 
of 10 or less ata time. When all subjects were present, 
the instructions and reading booklets were distributed. 


"« The general instructions for all groups specified that 


N 


this was an experiment to determine how people read 
written materials. The subjects were told that they 
would be given a passage to read and that they would 
be given a multiple-choice test on the content of the 
passage after reading the passage. All subjects were 
told that they would have 10 minutes to read the pas- 
sage and that if they finished before the time was up, 
they could reread the passage. Subjects in the read- 
only conditions were told not to make any marks on the 
pages, while subjects in the underline conditions were 
told that they should underline key words or phrases as 
they were reading the passage. Subjects in the high 
motivation conditions were told that the five subjects 
who obtained the highest scores on the multiple-choice 
test would receive $5. Subjects in the low motivation 
conditions were unaware of the payoff procedure. 

"The instructions for all groups further specified that 
after the end of the 10-minute reading period, the 
Subjects would be required to complete a brief ques- 
tionnaire about the passage. The subjects were told 
that they would have 1 minute to complete this ques- 
tionnaire. When the minute had ended, they were told 
to turn to the next page and begin answering the ques- 
tions on the multiple-choice test and that they would 
have 15 minutes to complete this test. 

After completing the instructions, the subjects were 
asked if they had any questions. After we answered 
their questions, the experiment began. All data for the 
pU in the low motivation conditions were collected. 

irst. 


Results 


An analysis of variance was performed on 
each of the two covariates (interest- and fa- 
miliarity-of-content ratings) to determine if 
either covariate was affected by the treat- 
ment manipulations (motivational level, 
readability level, and type of task) used in 
this study. The two analyses revealed that 
the interest and familiarity ratings were not 
influenced by the manipulations. There- 
fore, both the interest and familiarity ratings 
met the conditions for use as covariates (see 
Kirk, 1968, p. 458) and were used in an 
analysis of covariance to reduce error vari- 
ance. 
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Table 1 

Adjusted Mean Number of Questions 
Answered Correctly as a Function of the 
Motivation, Task, and Readability 


Manipulations 
Readability level 


Factor Easy Hard 
Motivation 
Underline 9.14 9.23 
Read 7.95 6.49 
No motivation 
Underline 8.54 6.95 
Read 7.95 5.55 


Note. The overall means for each manipulation are as follows: 
motivation = 8.20, no motivation = 7.25, underlin .46, read 
only = 6.98, easy version = 8.40, and hard version = 7.05. 


Table 1 shows the adjusted mean number 
of questions answered correctly as a function 
of the motivation, task, and readability 
manipulations. The analysis of covariance 
on recognition scores yielded significant 
main effects for the motivation, readability, 
and task manipulations, F(1, 150) = 4.28, p 
< .05; F(1, 150) = 8.38, p < .01; and F(1, 150) 
= 10.11, p < .01, respectively. Subjects who 
were highly motivated performed better on 
the recognition test than subjects who were 
not highly motivated, subjects who received 
the easy version of the passage performed 
better on the recognition test than subjects 
who received the hard version, and subjects 
who were allowed to underline while reading 
performed better on the recognition test 
than subjects who only read the passage. 

Although all interactions between the 
motivation, readability, and task manipu- 
lations failed to reach significance, a number 
of planned a priori contrasts on the adjusted 
data, using Dunn's procedure (see Kirk, 
1968), were significant. These contrasts 
revealed that the subjects who were not 
highly motivated and received the easy ver- 
sion of the passage performed significantly 
better, F(1, 150) = 9.33, p < .01, on the rec- 
ognition test (M — 8.25) than the subjects 
who were not highly motivated and received 
the hard version (M = 6.25). The subjects 
who were highly motivated and were allowed 
to underline performed significantly better, 
F(1, 150) = 9.10, p < .01, on the recognition 
test (M — 9.19) than the subjects who were 
highly motivated and only read the passage 
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(M = 7.22). Finally, the subjects who re- 
ceived the hard version of the passage and 
were allowed to underline performed sig- 
nificantly better, F(1, 150) = 10.09, p < .01, 
on the recognition test (M = 8.09) than the 
subjects who received the hard version and 
only read the passage (M = 6.02). Other a 
priori contrasts revealed that there were no 
differences on the recognition test between 
(a) the easy and hard versions of the passage 
for the subjects who were highly motivated; 
(b) the underline and read-only conditions 
for subjects who were not highly motivated; 
and (c) the underline and read-only condi- 
tions for the subjects who received the easy 
version of the passage. 


Discussion 


The results of the present study support 
the findings of earlier work that improving 
the readability of a passage (see Klare, 1976, 
for a review) and increasing the motivational 
level of subjects (Frase, 1971; Frase, Patrick, 
& Schumer, 1970) enhance subjects' per- 
formance and that these factors interact 
(Klare, 1976). The fact that the present 
study found that the readability level of a 
passage had a greater impact on non-highly 
motivated subjects than on highly motivated 
subjects extends the findings of. McLaughlin 
(1966) and Denbow (1973) as to the type of 
motivator that influences the differences in 
the readability level of a passage and as to 
the type of test that is sensitive to these 
differences. Thus, as Klare (1976) pointed 
out, readability is less important when 
FORME motivation is high than when it is 
ow. 

‘ The results also show that the type of ac- 
tivity a subject performs affects performance 
and interacts with the subject’s motivational 
level and the readability level of a Passage. 
In particular, the results support the findings 
of Rickards and August(1975) that subjects 
who are allowed to underline while reading 
perform better than subjects who only read 

a passage. The underlining procedure ap- 
parently forced the former subjects into in- 
teracting more with the content of the pas- 
sage. This greater interaction by the 
subjects who underlined may have resulted 
in their processing the material at a deeper 
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level, hence improving their ability to recall: 
the passage in comparison to subjects who 
only read the passage. 

We predicted that the underlining pro- 
cedure would have its greatest impact on the 
non-highly motivated subjects. However, 
the opposite occurred. This pattern of re- 
sults was unexpected, since Fass (1976) 
found that some types of activities were more 
beneficial for non-highly motivated subjec 
However, Fass did not manipulate motiva: 
tion directly and used experimenter-gener- 
ated activities rather than the subject-gen 
erated activity used in the present study. I 
is possible that subject-generated activiti 
necessitate greater involvement on the pai 
of the subject—a condition that may occur 
only with highly motivated subjects. In ef. 
fect, the non-highly motivated subjec 
might have underlined fewer phrases oi 
more inappropriate phrases than th 
subjects who were highly motivated. 

In order to determine if this was the case, 
an analysis of variance was performed on the 
total number of underlinings and on the 
appropriateness of the underlinings. An 


subjects (5.23 vs. 4.47). Although this result 
must be interpreted cautiously, non-highly. 
motivated subjects may not have interac 
appropriately with the materials, resulting” 
In poorer performance, : 

In relation to the underlining activity, we 
also predicted that the underlining proce- 
dure would have its greatest impact on the 
subjects who received the hard version of the. 
Passage. This prediction was upheld by the 
analyses. It seems that the easy version was 
easy enough so that increasing the amount 
of activity was not beneficial, whereas in- 
creasing the amount of activity in processing. 
the hard version was beneficial. 

Finally, it is possible that the results 
showing equal performance on the easy an 
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"e hard versions for the highly motivated 


zi 


subjects were due, in part, to the beneficial 
effects of underlining. Underlining ap- 
peared to aid performance for highly moti- 
vated subjects and subjects who received the 
hard version. Thus, the underlining pro- 
cedure may have increased the scores for the 
highly motivated subjects when they re- 
ceived the hard version. The analyses also 
suggest that the superior performance on the 
easy version for the non-highly motivated 


*"« subjects was due, in part, to the lack of an 


effect of the underlining procedure for these 
subjects. 

In summary, the present study suggests 
that an adequate conceptualization of prose 
retention must include provisions to account 
for (a) the differences in the difficulty of 
passages used, (b) the motivational level of 
subjects, and (c) the type of activity a subject 
employs to process the passage. 
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Reading Skill and Memory Scanning 


Robert V. Kail, Jr., and Christine Vereb Marshall 
University of Pittsburgh 


Four experiments were conducted to investigate differences between skilled 
and less skilled readers in the rate with which they scan memory. In each ex- 
periment, third and fourth graders read one to three unrelated statements, 
then answered a yes-no question pertaining to one of the statements. The 
primary result from Experiments 1 and 2, in which children read all material 
aloud, was that skilled readers answered questions approximately .6 sec faster 
than less'skilled readers when reading time was partialed out. In Experiment 
3, similar results were found for silent reading. In Experiment 4, the differ- 
ence in answering time found in Experiments 1-3 was no longer significant 
when the scan component in answering was minimized. 


Memory has often been linked to reading 
skill and reading disability (Torgeson, 1975). 
For example, good and poor readers have 
been found to differ in terms of mnemonic 
processes such as rehearsal (Torgeson & 
Goldman, 1977) and encoding (Perfetti & 
Goldman, 1976). The focus of the present 
research was the relation of reading skill to 
another memory process, that of memory 
scanning. The relation of memory scanning 
to reading skill was suggested by the results 
of Kail, Chi, Ingram, and Danner (1977). In 
that study, sixth graders read a three-sen- 
tence paragraph, then read and answered a 
simple yes-no question concerning the in- 
formation contained in the paragraph. 
Scores on a reading comprehension test re- 
lated to the speed with which questions were 
answered, with higher reading scores asso- 
ciated with faster answers, "This was not 
surprising, Since it was necessary to read the 
question prior to answering it, and the 
measure of question answering speed re- 
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flected both reading and answering times; 
However, when reading time was held con- 
stant statistically, reading ability and an- 
swering time were still highly correlate d 
(—.64). Kail et al. (1977) argued that witht 
reading time held constant, answering time 
presumably reflects the time required to (8) 
scan memory for information relevant to the 
question and (b) compare the retrieved in 
formation with the question and determin 
the appropriate answer. Thus, it appeared 
that children of varying levels of reading ski 
differed in the rate with which they scanned 
and compared items in memory. The pur 


Experiment 1 


Our first experiment is modeled after 
Sternberg's (1966) research on memoi y 
Scanning for digits, In Sternberg's research, 
asubspan set of digits is shown, followed by 
The subject’s task is to ind 
cate if the single digit was included in the 


single digit apparently is compared with each 
digit of the memorized set. The subject thet 
answers “yes” if a match has been detected} 

no” if not. According to this model, tht 


slope of the reaction time function provides 


E 3 
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v^ an estimate of the time necessary to compare 
the single digit with a member of the set. 
We used a similar procedure in Experi- 
ment 1, substituting statements for digits, in 
order to separate memory comparison pro- 
cesses from encoding and response execu- 
tion. Children read either one or three 
statements, which were followed immedi- 
ately by a question regarding one of the 
Statements. Following Sternberg (1966), an 
increase in answering time between the one- 
and three-sentence conditions would be at- 
tributed to the increased time required to 
retrieve and compare the larger set of 
Statements. More important, if skilled and 
less skilled readers differ in the speed with 
which information in memory is compared, 
then we would expect an interaction between 
reading skill and number of statements: 
The difference between skilled and less 
skilled readers’ answering times should in- 
_crease as the number of statements read by 
subjects increases. 


Method 


Participants. The skilled reading group consisted 
of five third- and five fourth-grade children who scored 
in Stanines 6-9 on the Science Research Associates’ 
Reading subtest. The less skilled reading group con- 
sisted of five third-grade and five fourth-grade children 
in Stanines 3-5. Intelligence test scores were not 
available for most of the children; however, according 


w. to teachers’ reports, all children in the sample were 


making average or above-average progress in school in 
all subjects except reading. 

Materials. Forty-eight sets of statements and 
questions were constructed. In 24 sets, a single state- 
ment was followed by a question; in another 24, three 
unrelated statements on a single slide were followed by 
aquestion, Each set contained 12 yes and 12no ques- 
tions. All statements contained five words. Questions, 
also five words long, were prepared for each sentence by 
presenting the statement preceded by “was” or “did.” 
Questions for which the correct answer was “no” were 
formed by changing a critical noun, adjective, or object. 
For example, 


The rubber duck was wet. 
Was the rubber duck dry? 


3 Procedure. Children were tested individually in an 
isolated room within their school. Children were told 
that the purpose of the experiment was to find out 
“some things about how children your age read.” They 
were also told that a statement would appear on the 
screen. After children had read the statement, they 
were to push the brge button on the response panel. 
This would expose a question, which they should answer 
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by pressing the appropriate response button. It was 
emphasized that children should: (a) read the state- 
ments “like you would read something in school” and 
(b) answer the questions correctly, as rapidly as possible. 
Children read aloud 48 sets of statements and questions. 
Within each block of four questions, each type of 
statement-question combination appeared once in a 
constant random order. Within the three-statement 
condition, yes and no questions tapped information 
presented in the first, second, or third sentence equally 
often. Testing required approximately 10-20 min- 
utes. 

Apparatus. Statements and questions were pro- 
jected separately onto a screen approximately 50 cm 
from the child. Each presentation of a statement ac- 
tivated a photocell, starting a timer, which stopped 
when the slide went off. When a question appeared, a 
second photocell and timer assembly was activated, 
which stopped when the child pressed either response 
button mounted on a box placed on a desk where the 
child sat. Three buttons were mounted on the box. A 
large button near the top was used by the child to ad- 
vance the projector in order to expose the question; two 
smaller buttons in the middle of the panel were used to 
answer “yes” or “no.” Children pressed buttons with 
the index finger of the preferred hand. 


Results 


Two aspects of the data are to be consid- 
ered: accuracy and answering times. Grade 
was a factor in all of the analyses, but it will 
not be discussed further as it did not interact 
with any of the variables of interest. 

Accuracy. Children answered questions 
quite accurately (M = 9296). Accuracy was 
greater on yes questions than on no ques- 
tions, F(1, 16) = 30.38, p < .01, and was 
greater following presentation of one state- 
ment than following three, F(1, 16) = 10.73, 
p <.01. The Answer Type X Number of 
Statements interaction was also significant, 
F(1, 16) = 10.0, p < .01. Following the 
presentation of one statement, yes and no 
questions were answered equally well (98% 
and 97%, respectively); while following three 
statements, yes questions were answered 
more accurately (93% and 79%). 

More important for our purposes, how- 
ever, were two other findings. First, chil- 
dren in the two reading groups answered 
questions equally well, as the main effect of 
reading skill and all interactions involving 
this factor were nonsignificant (ps > .10). 
Second, there was no evidence of a speed- 
accuracy trade-off; higher error rates were 
associated with greater latencies for both 
groups of children. 
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Contrary to our expectations, the interaction _ 
of reading skill and number of statements 
was not significant (F = 1.01). Two other - 
interactions were highly significant, however, 
The Number of Statements X Answer Type - 
interaction was highly significant, F(1, 15) | 
= 42.21,p <.01. Asisapparent in Figure 1, 
yes agria were answered at approxi- 
mately the same speed whether they fol- 
lowed one or three statements; no questions. 
were answered much more slowly following: 
three statements. This general pattern of 
results was found for both groups of readers, 
However, the increase in latency for no 
questions following three statements was 
greater for less skilled than for skilled read- 
ers, resulting in a significant Reading Skill 
Number of Statements X Answer Type 
interaction, F(1, 15) = 5.58, p < .05. 


Experiment 2 


in that we presented two-state: 


ment sets. 


of six third- and four fourth-grade children who 
in Stanines 6-7 on the Reading subtest of the Metro: 
Achievement Test (MAT). The less skilled 
Conroe and four -— 
two groups of chi 
matched on the basis of their scores on the math: — 
matics subtest of the MAT (Ms = 4.8 and 4.78," 


sete of statements and questions generated specifically 


Consisting of 12 pairs of statementé 

wa mit for each of the one’, tire and the 

'ot six pairs in each conditióf 

be sewer was yes; for six, it was no. Within 
‘statements 
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statements, F(2, 36) = 5.25, p < .025: 98%, 
94%, and 92% for the one-, two-, and three- 
statement conditions, respectively. As in 
Experiment 1, two other findings were of 
importance, First, children in the two 
reading groups anawered questions equally 
well, as the main effect of reading skill and 
all interactions involving reading skill were 
nonsignificant (Fs < 1.43), Second, there 
was no evidence of a speed-accuracy trade- 
off, since across conditions, greater accuracy 
was associated with faster answering 
times, 

Answering times, Six median answering 
ptimes were computed for each child, one 
corresponding to each of the Answer Type X 
Number of Statement combinations, As in 
Experiment 1, skilled readers read more 
rapidly than less skilled readers (average 
reading time per sentence of 6,65 sec and 
10,6 sec, respectively). Consequently, an 
analysis of covariance was computed, with 
reading speed as the covariate and reading 
skill, answer type, and number of statements 
as factors. Skilled readers answered ques- 
tions .67 sec more rapidly than less skilled 
readers, F(1, 17) = rg Poach < Sh ars 
comparable to the .64-sec 
—— In Experiment 1. jte m hende 

inaction wes ipnficant FU. 30) a DR 
| ion was significant, F(2, 35) = 
answered at 


P €.01. Yes questions were 
hemes 9d re! rate of 
number of statements , while no 
questions were answered more slowly as the 
number of statements read increased. 
, as is evident from the data in Fig- 
ure 2, these effects differed for the two 
of children, resulting in a significant 
ng Skill x Number of Statements X 
‘Type interaction, F(2, 35) = ies 2 
X 05. Skilled readers clearly were - 
fected by the number of sentences on yes 
and only slightly on no questions. 
( simple interaction of number of state- 
E 
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is presented, all statements in working 
memory are scanned simultaneously to de- 
termine if they have information relevant to 
the question presented. When the partic- 
ular combination of words (or their semantic 
equivalent) in the question is matched in 
memory, the child answers “yes.” The fact 
that no questions are affected by the number 
of statements presented suggests sequential 
processing of statements. How might this 
occur? Search of memory following a no 
question will not result in a match. We 
argue that instead of immediately respond- 
ing “no,” achild rechecks all information in 
working memory sequentially to verify the 
results of the initial simultaneous search. 
Increases in latency as a function of the 
number of statements read simply reflect the 
extra time required to scan each additional 
statement stored in working memory. 

Why should this second search be in- 
voked? It is important to remember that in 
the present experiments, in contrast with 
most experiments involving memory search 
for digits (Sternberg, 1966) or sentences (e.g., 
Anderson & Bower, 1973), each statement 
was read only once and was not in any sense 
familiar. The task was further complicated 
by the fact that in the multiple-sentence 
conditions, the sentences were unrelated. 
Thus, we suggest that it is quite likely that 
children invoked a “conservative” processing 
strategy, one designed to maximize the 
likelihood of a correct response at the ex- 
pense of some speed. 

The pattern of results for skilled and less 
skilled readers suggests that both groups of 
children pursued the strategies of (a) si- 
multaneous search following yes questions 

and ( b) simultaneous plus sequential search 
following no questions. The groups differed 
primarily in the rate with which these 
searches of memory were executed; in both 
cases, skilled readers were faster. In other 
words, it would appear that skilled and less 
skilled readers differ quantitatively but not 
qualitatively in the ways in which they 


search memory. 
Experiment 3 


We have shown in Experiments 1 and 2 
that skilled and less skilled readers differ in 
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the rate with which they scan memory for | 


information necessary to answer questions, 
In both experiments, children read state- 
ments and questions aloud. The purpose of 
Experiment 3 was to determine if compara- 
ble results would be obtained with silent 
reading. Children were tested only in the 
one-sentence condition. 


Method 


Participants. Two groups of eight third graders, 
equated in IQ but differing in reading ability, partici- 
pated. Skilled readers scored at Stanines 6-9 on the 
Reading subtest of the MAT; the less skilled readers, 
at Stanines 3 and 4. The children in the two groups 
were matched for IQ, as measured by the Otis-Lennon 
intelligence test. Mean IQs for the skilled and less 
skilled readers were, respectively, 109 and 113. 

Procedure. The apparatus and procedures from the 
previous experiments were used. A set of 48 statements 
and 48 accompanying questions were generated and 
presented in a constant random order. Within each 
block of four questions were two yes and two no ques- 
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tions. Half the children in each group read the first 247 


statements and questions aloud, then the next 24 sil- 
ently; half read silently first, then aloud. Testing re- 
quired 10-15 minutes. 


Results and Discussion 


‘Two aspects of the data will be considered: 
children’s accuracy in answering questions 
and their answering times, 

Accuracy. Children answered questions 
quite accurately (M = 93%). Analysis of 
variance of these data produced Fs < 1 for 
the effects of reading skill and mode (oral or 
silent) and all interactions involving these 
variables. Children were slightly more ac- 
curate on yes questions (97%) than no 
questions (89%), F(1, 14) = 6.71, p < .05. 

: wering time. Four median answering 
times were computed, one corresponding to 
each of the Mode x Answer Type combina- 
tions, based on times for correct. responses 
only. To adjust for the effect of reading time 
on answering time, an analysis of covariance 
was computed. When reading time was 
covaried, skilled readers still answered 
questions .62 sec more rapidly than less 
skilled readers, F(1, 14) = 10.46, p < 01. 
Questions were answered slightly more 
rapidly following silent reading, F(1, 13) = 
3.43, D < .10. All other main effects and 
interactions were insignificant. Of primary 
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importance, the Reading Skill X Mode in- 
teraction was not significant (F « 1), re- 
flecting the consistency of the difference in 
answering times between skilled and less 
skilled readers in oral and silent reading (.56 
sec and .67 sec, respectively). 


Experiment 4 


We have argued that skilled readers an- 
swer questions more rapidly than less skilled 
readers, in part, because skilled readers 
search memory more rapidly. If we are 
correct, then differences in answering time 
should be eliminated if a search of memory 
is not required to answer a question. This 
prediction was tested in Experiment 4 by 
modifying the procedure used in Experi- 
ments 1-3. Each child saw several sets of 
three slides. The first slide consisted of a 
sentence; the second, a question; the third, 
asingle word. The child was asked to verify 


vee-if the word on the third slide was the ap- 


propriate answer to the question posed in the 
second slide. For example, the following was 
one of the sets. 


1, The man drank the milk. 
2. What did the man drink? 
3Y. Milk. 

3N. Water. 


As in Experiments 1-3, the child read the 
first sentence at his or her own rate, then 
advanced the projector, which exposed a 
question. In contrast to the previous ex- 
periments where children were encouraged 
to answer as quickly as possible, children 
were now told to think what the answer to 
the question would be and “keep that word 
in mind." When ready, the child was to 
advance the projector a second time and 
judge if the word projected was correct (as in 
3Y above) or incorrect (as in 3N). 

We assumed that in this procedure, 
memory search occurs upon presentation of 
the question in the second slide. Therefore, 
answering times obtained upon presentation 
of the third slide reflect encoding, decision 
and response execution times, but little or no 
memory search time. Accordingly, we 
would expect latency differences between 
E and less skilled readers to be mini- 
mal. 
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Method 


The eight skilled and eight less skilled readers from 
Experiment 3 were retested. The procedure and ap- 
paratus were essentially those from Experiments 1-3. 
Two practice sets of a sentence, question, and word were 
followed by 24 test sets.. The correct answer was yes for 
12 sets and no for 12 sets. A yes and no set appeared 
twice in a random order in each block of four sets. All 
materials were read aloud. 


Results 


Children answered questions quité accu- 
rately (M — 9796). No differences in accu- 
racy were found between the groups of chil- 
dren or for the two types of decisions. Me- 
dian latencies were computed separately for 
yes and no decisions for both study and 
correct answering times. Skilled readers 
read more rapidly than less skilled readers, 
but the two groups of children spent similar 
amounts of time studying the questions (F 
« 1). More important were the results of the 
analysis of answering times, in which reading 
skill and answer type were factors and 
reading speed was the covariate. Skilled 
readers evaluated the correctness of the word 
in the third slide approximately .1 sec faster 
than less skilled readers, a nonsignificant 
difference (F <1). Furthermore, the same 
result was obtained when the analysis was 
computed with study time as the covariate. 
Thus, the .5-sec difference in answering time 
between skilled and less skilled readers 
found in the previous experiments was re- 
duced to .1 sec when the memory search 
component was minimized. 


General Discussion 


The purpose of these experiments was to 
investigate differences between skilled and 
less skilled readers in the way memory is 
searched for information needed to answer 
asimple question. Evidence for such a dif- 
ference was found in each experiment. Two 
aspects of these results deserve comment: 
(a) the factors that might give rise to this 
difference between skilled and less skilled 
readers and (b) how the differences in 
question-answering skills studied here might 
occur in the more routine aspects of read- 


ing. 
Why should less skilled readers search 
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memory more slowly than skilled readers? 
It might be the case that less skilled readers 
suffer from some sort of processing deficit 
that affects rate of search of all materials. 
This alternative seems unlikely, as Katz and 
Wicklund (1971, 1972) have shown that 
skilled and less skilled readers search visual 
displays of letters or words at approximately 
the same rate. Following Perfetti and 
Goldman (1976), we suggest that a more 
likely source of the difference in rate of 
memory search is the manner in which in- 
formation is encoded originally in working 
memory. Perhaps less skilled readers tend 
to encode sentences verbatim, while skilled 
readers typically encode “a syntactically 
bare” version of a sentence that preserves 
important semantic relations. Such a dif- 
ference might well account for the difference 
in memory search found here. Search of 
each statement takes longer for less skilled 
readers because their stored representations 
of statements contain more items and, hence, 
more interconnecting links. Less skilled 
readers might answer more slowly because 
they must match more elements between 
question and statement before responding. 
We would argue that the results found in 
the present research are not limited to the 
search processes necessary for answering 
questions; rather, they may shed light on the 
ways in which readers of differing levels of 
skill comprehend information upon initial 
reading. Comprehension requires that an 
individual relate new information to infor- 
mation that is already known (i.e., is in 
working or long-term memory). In other 
words, upon presentation of a word or sen- 
tence, a reader presumably must search 
memory to find information that defines, 
describes, interprets, or otherwise renders 
meaningful the target word or sentence. 
The process can be seen clearly in (but is 
most certainly not limited to) the under- 
standing of pronouns. Consider the fol- 
lowing sentences: 


1. Jeff dove into the pool. 
2. He swam a mile. 
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To understand Sentence 2, it is necessary to | 


identify *He" as a pronoun used to refer to 
a male and then scan Sentence 1 for an ap- 
propriate antecedent. It would seem rea- 
sonable that the rate with which these scans 
of memory are completed is one of the fac- 
tors determining the rate with which Sen- 
tence 2 is understood. From our results, 
then, it seems plausible to suggest that the 
speed of memory search may be one factor 
contributing to ability differences in reading 
comprehension. 

A final comment is needed concerning the 
limiting conditions of the findings reported 
here. The terms skilled and less skilled 
readers were selected intentionally: All 
children in the samples tested could read, 
albeit with varying facility. It is an open 
question whether our results and conclusions 
would apply to reading disabled children and 
other groups of children who experience 


difficulty while reading or in learning to_ 


read. 
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Student, Peer, and Self Evaluations 
of College Instructors 


Kenneth O. Doyle, Jr., and Leslie I. Crichton 


University of Minnesota 


This paper compares student, peer (or colleague), and self ratings in terms of 
item statistics, convergent and discriminant validity, and relation to student 
learning. Ratings from the three sources were similar in range and distribu- 
tion, although colleagues tended to give the most favorable ratings, students 
the least favorable. Individual student and colleague reliabilities were also 


similar; composite student reliabilities were considerably higher than compos- 
ite colleague reliabilities, only partly because of differing sample sizes. Stu- 


dent and self ratings and rankings were quite good in terms of convergent and 


discriminant validity, but no student, peer, or self rating was significantly re- 
lated to residualized student achievement. 


Among the measures most frequently 

proposed for use in the evaluation of college 

seecinstruction are student, peer, and self rat- 
ings. 

Student ratings have been subject to 
considerable research, especially in recent 
years. Reviews by Costin, Greenough, and 
Menges (1971) and Doyle (1975) have con- 
cluded that student ratings are quite reliable 
and reasonably valid measures of at least 
some important aspects of teaching. Nev- 
ertheless, controversy continues about the 
use of student ratings, particularly in pro- 
" motion and salary decisions. Much of the 

controversy relates to the relationship (or 
lack thereof) between student ratings and 
tested student learning (cf. Doyle & Whitely, 
1974; Elliott, 1950; Frey, 1973; Remmers, 
Martin, & Elliott, 1949; Sullivan & Skanes, 
1974; Centra, Note 1; and, in contrast, Rodin 
& Rodin, 1971). — 
Peer and self ratings have been much less 
thoroughly studied, even though each of 
aw these measures is often used in the evalua- 
tion of instruction, including promotion and 
salary decisions (Astin & Lee, 1966). Mas- 
low and Zimmerman (1956), Murray (Note 
2), and Blackburn and Clark (1975) found 
correlations of .69, .87, and .62 between col- 


l Requests for reprints should be sent to Kenneth O. 


Doyle, Jr., Measurement Services Center, University 
of Minnesota, 9 Clarence Avenue, Southeast, Minne- 
A apolis, Minnesota 55414. 


league and student ratings of overall effec- 
tiveness. Webb and Nolan (1955) reported 
a correlation of .62 between self and student 
ratings, but Clark and Blackburn (Note 3) 
found a correlation of .19. Clark and 
Blackburn also reported a correlation of .28 
between self ratings and colleague ratings. 
Centra (Note 4, Note 5) found peer ratings 
less reliable and more lenient than student 
ratings and self ratings sometimes the same 
as, sometimes lower than, but usually higher 
than student ratings; he also found self rat- 
ings and student ratings very similar in 
pattern of strengths and weaknesses across 
items. Doyle and Webber (in press) found 
no identifiable instructor characteristic that 
seemed to bias self-ratings and noted that 
instructors who describe themselves as bet- 
ter teachers are those who report success in 
getting students interested in the material 
and who say that they themselves enjoy 
teaching and like the course material. But 
self-ratings will probably always be suspect 
because of assumed leniency effects (Kulik 
& Kulik, 1974), and peer ratings will be sus- 
pect because of limited opportunity to ob- 
serve classroom instruction (Centra, 1973). 

Despite the emphasis on student learning 
as a criterion for instructor ratings, no study 
could be located that compared either peer 
or self ratings with student learning. 

The purpose of the present study, then, 
was to compare student ratings, peer ratings, 
and self-ratings of college instruction. 
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Three avenues of comparison are possible. 
First, the three kinds of ratings may be 
compared in terms of item statistics, such as 
discrimination and distribution. shape. 
Second, the ratings, may be examined for 
convergent and discriminant validity in the 
framework of a multitrait multimethod 
matrix (Campbell & Fiske, 1959). And 
third, the ratings may be compared in terms 
of their relationship to student learning. 

The present study employs all three 
comparisons and, in the subsequent discus- 
sion, examines the strengths and weaknesses 
of each kind of comparison. 


Method 


Design 


The design of this study required a multisection 
course with a common syllabus, common readings and 
other assignments, and a common final examination 
and in which the section instructors were also respon- 
sible for the actual teaching (as distinguished from, say, 
leading recitations). The course had to be one from 
which the following information could be obtained: 
student ratings of the instructors, self. -ratings by the 
instructors, peer ratings by the various instructors of 
one another, students’ final examination scores, and 
Scores on a premeasure of ability. Although it was ex- 
tremely difficult to locate such a course, introductory 
communications at the University of Minnesota closely 
approached all of these requirements. Introductory 
communications deals with elementary aspects of 
composition, speech, and linguistics. It satisfies a part 
of the graduation requirement for liberal arts under- 
graduates. 


Subjects 


The total number of sections in introductory com- 
munications was 18; the total number of instructors, 14; 
the average number of students per section about 21. 
Six of these instructors taught 2 sections each; in these 
cases, 1 of the 2 sections was eliminated at. random, 
Student ratings were unavailable for 2 instructors, and 
for 1 of those 2, the self and peer. ratings were also un- 

available; however, both of these instructors were re- 
tained as ratees for the peer ratings. 

Accordingly, the subjects for the student rati 

the 263 students and 12 instructors from ea 
data were available. The raters for the self-ratings and 
peer ratings were 9 of those 12 instructors, and the 
ratees for the peer and self-ratings were all 14 instruc- 
lors. It should be noted that the instructors were 
mostly advanced graduate students who, by virtue of 
office arrangements and involvement in instructional 
development seminars, may have been somewhat, more 
acquainted with one another than "typical" faculty in 
large academic departments. 
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Instruments and Procedures 


Student ratings. Students rated their instructgl 
in class after the final examination for fall term, 197 
Six rating items were used. Four of these, the “specifi 
items, were the highest loading items from the four mg 
stable factors identified by Doyle and Whitely (1974 
These items were: "clearly presented the subje 
matter”; “was approachable”; “got students interest 
in the subject matter”; and “raised challenging q 
tions.” The remaining two items were summary 
“general” items taken as overall evaluations of teachil 
process and teaching outcome, respectively: “Ho 
would you rate this instructor’s overall teaching abi 
ty?” and “How much would you say you learned fi 
this instructor?”. Each item was answered on an aj 
propriate 7-point scale. The questionnaire also cot 
tained items asking for the students’ year in school, tht 
sex, and their liking for the subject matter of th 
course. 

To facilitate matching ratings with examination 
students were asked to sign their names to the que 
tionnaires but were assured strictest confidentialit 
To further reduce the likelihood that signing the 
names would influence their ratings (Sharon, 1970), 
somewhat elaborate procedure was devised. A studel 
collected the completed questionnaires in class, plac 
them in an envelope, and delivered that sealed envel 
to a departmental secretary. The investigators the 
collected the still-sealed envelope. Although it j 
peared that the students were comfortable with th 
procedure, it is possible that identification may 
had some influence on the level, perhaps even on tl 
pattern, of ratings. 

Self and peer ratings and rankings. A peer- an 
self-evaluation booklet was especially prepared. Bad 
instructor used this booklet to rate one colleague al 
time on each of the six items just described for the st 
dent ratings. Each page of the booklet dealt with om 
instructor; the order of presentation of instructors Wi 
randomly determined for each booklet. ‘The last ratin 
was a self-rating on each of the six items. The S 
ond-to-last page of the booklet had the instructors ran 
their colleagues from best to worst on “overall teachin 
ability." The last page asked the raters to place th 
own names in that ranking. 

These peer and self evaluations were done privatel 
at about the same time as the student ratings. Strii 
confidentiality was assured, and the instructors ma 
the evaluations directly to the investigators. PerhaP 
more so than with the student ratings, it is possible tha 
the peer ratings and especially the self ratings may Wo 
generalize to situations in which confidentiality is 0 of 
guaranteed or in which the ratings may be used in pr 
motion and tenure decisions. a 

It is important to note that because classroom 
tationis not a widely accepted practice (Astin & 
1366), these instructors were not asked to visit on 1 
other s classes. Rather, to keep conditions as realis! 
as possible, they were instructed to rate their colleag 
probable classroom presentation by generalizing 
such routine experiences as behavior at faculty me 
ings, colloquia, and social gatherings. 
evaluations, then, were inferences about classroom 
struction rather than in-class observations. As Pf 


STUDENT, PEER, AND SELF EVALUATIONS 


\..<.ously suggested, the cohesiveness of this group of in- 

^ structors may have led to an unusual degree of knowl- 
edge about one another’s probable teaching behav- 
lors. 

Final examinations. Instructors and investigators 
collaborated on the preparation of an objective final 
examination to be used in addition to the usual essay 
examination. Each instructor rated several dozen 
course topics in terms of the emphasis given those topics 
in his or her section of the course. A list of topics of 
greatest and approximately equal emphasis across 
sections was prepared from those ratings. The inves- 
tigators, with the help of a subcommittee of the in- 

Pasir, then prepared a pool of objective items for 
hose topics. From that pool, the instructors as a group 
chose the final set of items. The result was a 3l-item 
objective examination with an internal consistency 
(Cronbach’s alpha) of .81 and a median difficulty level 
of .70. 

Premeasures of student ability. Verbal scores from 
the Preliminary Scholastic Aptitude Test (PSAT), 
available from university admissions files, were selected 
as premeasures of student ability. 


Analyses and Results 
--—— 


Preliminary Analyses 


Certain preliminary analyses were per- 
formed to examine demographic similarity 
among the students in the different sections 
and to determine the discriminating power 
of the various ratings. 

Student demographics. One-way uni- 
variate analyses of variance indicated no 

» differences across sections on student sex (p 
= ,70), year in school (p = .38), or liking for 
the subject (p = .17). These analyses 
suggest that the student groups were similar 
to one another in important respects. 

Student ratings. A one-way multivariate 
analysis of variance testing the four specific 
and two general ratings for differences across 
sections was significant at p < .001. Table 
1 shows that subsequent univariate analyses 
of variance (ANOVAs) on each of the six 

m items were all significant (p < .001). These 

analyses indicate that the student ratings 
differentiate among instructors. 

Colleague ratings. Parallel analyses were 
performed on the peer ratings. Table 1 also 
shows that the colleague multivariate anal- 
ysis of variance (MANOVA) was significant 
at p < .001 and that four of the six subse- 
quent ANOVAs were significant at p < .03. 
For the colleague ratings, then, "clearly 
K Presented the subject matter," “raised 
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Table 1 

Multivariate Analyses of Variance 
(MANOVAS) of Student Rating Items and of 
Colleague Rating Items With Univariate 


Follow-Up Analyses (ANOVAs) 


p for 
Analysis and item colleagues 

MANOVA -001 
ANOVA 

1. Clearly presented subject matter .001 

2. Was approachable 14 

3. Got students interested in subject ll 

matter 

4. Raised challenging questions .03 

5. Overall teaching ability .03 

6. How much students learned .02 
Note. Forstudents, the MANOVA and the ANOVAs for each 


item were significant at p < .001. 


challenging questions," “overall teaching 
ability," and *how much students learned" 
all discriminated, but “was approachable” 
and “got students interested” did not. It 
should be noted that these are probably 
conservative estimates of the actual signifi- 
cance levels, because the more appropriate 
repeated measures analysis was not feasible 
owing to the rather small number of raters 
relative to the number of ratees. 

Self-ratings. Although an explicit test of 
discrimination among self-ratings was not 
possible, the standard deviations and fre- 
quency distributions included in Table 2 
provide some indication of discrimination in 
self-ratings. The variation, however, is en- 
tirely among points on the positive end of the 
scale. 


Item Statistics, Reliability, and 
Agreement 


Student, peer, and self ratings were com- 
pared next for item statistics and for reli- 
ability and agreement. 

Item statistics. Table 2 indicates that 
mean ratings are fairly similar from students, 
colleagues, and instructors themselves, the 
most extreme pairs differing by .6-.8 units; 
that, within this range, students tend to give 
the least favorable ratings, colleagues the 
most favorable; and that most ratings, from 
whichever source, show similarly little skew. 
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Table 2 


Item Statistics for Student, Colleague, and Self Ratings | 


Frequency distribution 


Item 1112579 4 


Se ee e 


1. Clearly presented 
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- E 


5 6 7 n M 


ads Ee 3 "7 51 160 801 64 17 3839 449 1.15 14 
31 25 51 7 17 5234 100 -28 
Colleagues 05 «0:18 i a 
Self 02500200 2 qe Sa 10 4.90 77 J 
2. Was approachable ‘ P a 
Students 8.6 19 98 8L 94 $83. 383 525 1.32 38 " 
Colleagues 008310 66 21 ?) 51 UW 1! 6529 1.08 —.39 
Self 0:7 50000 1 $724 °° 10 590 110 33 
8. Got students 
interested s 
Students 7 8 652 133 8 59 33 37 4.57 1.31 —.02 
Colleagues UU SS EELIT 15 099 07 
Self OLR Oe TP. Sees 03299102470. 0.82 58 
4, Raised challenging 
questions 
Students 4 16 70 12 7 59 2 876 442 1.32 Ri 
Colleagues 0 0 3 28 30 45 1 1 528 102 -19 
Self OP POnOs s 48:0: 310... 500 0.82 .00 
5. Overall teaching 
ability 
Students 21759" 80 "H3 73 - 33 379 467 136 -.24 
Colleague 90100 —8 25 8 50 89 n7 527 100 -38 
Self UNUS p 22070. 10 , 490 074 14 
6. How much students 
learned 
Students 8 15 30 136 100 81 12 382 456 123 -46 
Colleague OucOMON dg 8052 GS 3*0 ^—537 ^ 0.85 — —40 
Self DESQNNNOENNU TATUM &iO ^ 57 — 1 
Tug EE c cese. CO 


The items that departed most from this 
pattern—“was approachable” and “got 
students interested”—were those that in the 
colleague ratings were not discriminating. 
Interrater reliability and agreement. 
Reliability and agreement indices were 
computed for the four specific and two gen- 
eral items. Table 3 presents reliabilities 
computed according to Ebel's (1951) for- 
mulas for average individual reliability (R;) 
and reliability of mean or composite ratings 
(Rc). The student composites averaged 
about 21 raters per section per item, and the 
colleague composites were the means of. eight 
colleagues' ratings. 
Individual reliabilities were about the 
same for colleagues and students, though 
perhaps slightly higher for students on “was 


approachable” and “got students inter- 
ested.” On four of the items, composite 
reliabilities were somewhat higher for stu- 
dents than for colleagues; on two of the 
items—again, “was approachable” and “got 
students interested”—the student relia- 
bilities were substantially higher. 

Because these composite reliabilities arè 
a function of the number of raters, the 
Spearman-Brown prophecy formula was 
used to estimate composite reliabilities whe” 
the number of colleagues was increased t0 
equal the number of students and also whe? 
the number of students was decreased 
equal the number of colleagues. These re- 
sults are also presented in Table 3 (Re co 
umn). When the number of students an 
colleagues were made equal, the differences 
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- Table 3 


Student and Colleague Interrater Reliability and Agreement on Common Rating Items 


Students Colleagues 
Item Ri; Rc Rc T Rr Re Rc T 
1. Clearly presented subject matter 14 -78 57 25 16 .63 .80 57 
2. Was approachable 20 .84 67 33 .05 .32 .53 .43 
3. Got students interested 44 17 57 .08 06 —.35 57 Ni 
4. Raised challenging questions AL NI .50 16 -10 50 -70 57 
. “Overall teaching ability d4- THE BITS 18. . 41  .52| 172 — 97 
How much students learned Bt] 17 54 25 10-50 0 —.78 


greater-than-chance agreement.) 


in composite reliability vanished for four of 

| the six items but remained for the two items 
for which the original differences were most 
pronounced, that is, “was approachable” and 
“got students interested.” 

The Lawlis-Lu test of random agreement 
< I^(Lawlis & Lu, 1972) was computed for each 
rating by each source, using the eight col- 
leagues’ ratings and a random subset of eight 
students' ratings and defining the discrep- 
ancy as no more than a 2-point difference 
among raters. Because there was significant 
agreement in every case, the measure of 
agreement proposed by Tinsley and Weiss 
(1975) was also computed, with the results 
also presented in Table 3. Agreement 
among colleagues was consistently and, in all 
cases except “was approachable,” substan- 
tially higher than among students. 


Convergent and Discriminant Validity 


Because different sources, or methods, 
were used to rate different instructor traits, 
the intercorrelations among student, col- 
league, and self ratings can be drawn to- 
gether in a multitrait-multimethod matrix 
(Table 4; Campbell & Fiske, 1959; Centra, 
1971). In a similar vein, Table 5 presents 
the correlations of student, colleague, and 
self ratings with colleague and self rank- 
ings. 

. In computing the correlations presented 
in Tables 4 and 5, the relatively small num- 
` ber of data points (a perennial problem in 
between-sections analyses; cf. Cronbach, 
976) led to the use of the Kendall rank 
orrelation rather than the Pearson corre- 


ote. Ry = average individual reliability; Rc = reliability of mean or composite ratings; Rc: = composite reliability when number 
of students and number of colleagues were made equal. T = extent of interrater agreement. (Positive values of T indicate 


lation because the latter's sampling distri- 
bution is highly unstable for small N, owing 
to its sensitivity to extreme scores. The 
Kendall correlation, on the other hand, has 
a sampling distribution of the set of proba- 
bilities for the various possible orderings and 
so is not unduly influenced by extreme 
Scores. 

Although the original Campbell and Fiske 
(1959) paper described criteria for inspecting 
multitrait-multimethod matrices, Kavan- 
agh, MacKinney, and Wolins (1971) have 
provided an analysis-of-variance procedure 
that is more objective, parsimonious, and 
convenient than simple inspection. This 
procedure enables one to analyze a matrix in 
terms of convergent validity (ratee effect), 
discriminant validity (Ratee X Items), halo 
(Ratee X Rater), and error (unknown sources 
of variance). The procedure also provides 
variance components that indicate the 
amount of variance attributable to each 
source. Table 6 presents the results of these 
analyses. 

For the overall Students X Self X Col- 
leagues matrix, the ratee and Ratee X Rater 
effects were both significant (p < .001), but 
the Ratee X Item effect was not significant. 
This indicates significant convergent validity 
and significant halo, but nonsignificant 
discriminant validity among the student, 
self, and colleague ratings. Accordingly, the 
variance components indicate a considerable 
degree of convergent validity (.42) and a 
similar degree of variance from unknown 
sources (.46). The variance component for 
halo (.24) shows a substantial amount of 
method error, but considerably less method 
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Correlations of Colleague and Self Rankings 


With Student, Colleague, and Self Ratings 


"Table 5 


Rankin 
Item Colleagues Self 
Student ratings 
-1. Clearly presented 35 43 
| subject matter 
2. Was approachable .08 14 
. Got students interested .11 43 
. Raised challenging .28 43 
questions 
5. Overall teaching ability .49* .52* 
6. How much learned .35 43 


Colleague ratings 


1. Clearly presented Ps eas .55* 
i subject matter 
2. Was approachable 32 22 ga 
_ 8. Got students interested .66** 4p 
4. Raised challenging .68** .84** 
questions 
5. Overall teaching ability rf os .64** 
6. How much learned . .69** .82** 
y Tatings P 
1. Clearly presented 7. .53* .50* 
subject matter 
. 2. Was approachable .28 .51* 
3. Got students interested 19 .08 
4. Raised challenging .29 .54* 
questions 
5. Overall teaching ability .46* .50* 
6. How much learned ,65** .65* 


Note. Correlation between colleague rankings and self ratings 
equal .64 (p < .05). 

* p< .05 
**p<.01 


error than convergent validity. The near- 
zero variance component for discriminant 
validity indicates either that there is much 
less discriminant validity than convergent 
validity in these ratings or that discriminant 
validity varies across traits. 

Table 6 also shows that the findings from 
the Colleague X Self and Student X Col- 
league analyses are very similar to those from 
the overall analysis. 

For the Student X Self comparison, how- 

ever, all three principal effects are signifi- 
cant. And the variance components indicate 
considerable convergent validity, some de- 
ee of both discriminant validity and halo, 
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and a substantial portion of variance at- 
tributable to unknown sources. 

According to Kavanagh, MacKinney, and 
Wolins (1971), the more traditional visual 
inspection of a multitrait-multimethod 
matrix may be used to complement these 
analyses of variance findings. The tradi- 
tional procedure seems especially useful here 
for locating instructor traits rated with 
greater or lesser degrees of validity. 
Campbell and Fiske (1959) point out that 
convergent validity is indicated by signifi- 
cant correlations in the validity diagonals 
(boldface in Table 4). In Table 4, five of the 
six correlations in Self X Student validity 
diagonals are significantly different from 
zero; two of the Self X Colleague validities 
and the same two of the Student X Colleague 
validities are significant, and two more Self 
X Colleague validities approach significance. 
That is, ratings from students, colleagues, 
and the instructors themselves converge for 
“raised challenging questions” and “how 
much students learned,” and ratings from 
students and the instructors themselves 
converge for all items except “got students 
interested.” Ratings from colleagues and 
the instructors themselves are also border- 
line convergent for “clearly presented the 
subject” and “overall teaching ability.” 

Campbell and Fiske (1959) also describe 
various procedures for estimating discrimi- 
nant validity. The most straightforward of 
these requires that the validity correlation 
for a variable (boldface in Table 4) be higher 
than the correlations between that variable 
and any other variable having neither trait 
nor method in common (enclosed in triangles 
in Table 4). In these terms, only a few of the 
“trait-method units” show more than a little 
discriminant validity. For example, the 
fewest correlations in excess of the validity 
correlations occur in the Self X Student tri- 
angles; there, for all items except “got stu- 
dents interested,” only one or two of each set 
of heterotrait correlations are elevated above 
the validity correlations. In the Colleague 
X Student triangles, only “raised challenging 
questions” and “how much students 
learned” come close to meeting this discri- 
minant validity criterion, and in the Self x 
Colleague triangles, only “how much stu- 
dents learned" meets the criterion. Thus it 


) 


822 KENNETH 0. DOYLE, JR., AND LESLIE I. CRICHTON 
i 
Te å itrait Multimethod Matrix | 
Analyses of Variance and Variance Components for the Multitrait Mu E | 
Variance 
MS F component 
Source SS df 
Student X Peer X Self TN 
Ratee 88.8192 11 8.0744 17.6953 
Ratee X Item 34.6896 55 .6307 aped ^ 
Ratee X Rater 42.2928 22 1.9224 4.2131 
Error 50.1984 110 4563 
Student x Self 
Ratee 71.2512 11 6.4773 16.4732*** 
Ratee X Item 33.9840 55 6178 1.5712* 
Ratee X Rater 17.1360 11 1.5578 3.9618** 
Error 21.6288 55 3932 39 
Colleague X Self | 
Ratee 63.4896 11 5.7717 11.0336*** ELI 
Ratee X Item 32.6304 55 .5932 1.1340 04 
Ratee X Rater 19.1088 11 1.7371 3.3207** 20 
Error 28.7712 55 .5231 
Student X Colleague 
Ratee 64.0512 1 5.8228 14.0139*** - 45 
Ratee X Item 29.9088 55 5437 1.3085 v .06 | 
Ratee X Rater 27.1872 11 2.4715 5.9482** 34 
Error 22.8528 55 A155 42 
* p <.05. 
** p SOL 
*** p S 001. 


seems that there is a considerable amount of 
"method variance" in these data, particu- 
larly when colleague ratings are involved. 
Campbell and Fiske suggest additional 
procedures, but these seem unduly subjec- 
tive in this situation; moreover, inspection 
of the matrix in these additional ways does 
not seem to change the conclusion in any 
substantial way. 
Some information about the convergent 
and discriminant validity of self and col- 
league rankings can be extracted from "Table 
5. For convergent validity, both self and 
colleague rankings correlate significantly 
with all other rankings and ratings of overall 
teaching ability. For discriminant validity. 
self and colleague rankings correlate more 
highly with student and colleague ratings of 
overall teaching ability than with student. 
self, or colleague ratings on most other traits. 
These data suggest very good convergent 


| 


validity and even fairly good discriminant 


validity for the self and colleague rank- 
ings. 


Achievement Measures 


also compared for relationships to student 
achievement as measured by the final ex- 
amination scores for the course and by the 
final examination scores ad justed for 
PSAT-verbal ability. : 
To obtain PSAT scores required written 
authorization from each student. In al! 
Sections of the course except one, at least 9 
of the students supplied that authorization: 
the one section, however, nearly half of the 
students declined to furnish the necessary 
authorization, As a test of possible bias 
within that section, the ratings by the stu: 
dents who supplied the authorization we! 


Student, self, and peer evaluations were 
i 
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compared to the ratings by the students who 


za P e did not. A multivariate analysis of variance 
eee sy using all six rating items indicated no dif- 
E EE ference between the groups (p = .34). Ac- 
E 3 ES #2 cordingly, the achievement means are ap- 
oR Ses) fS parently not biased by the use of only a 
^ Sg subset of the students in that section. 
EXON AAE] To obtain a more stable regression equa- 
REST 3 a tion, the decision was made to regress the 
ul final examination scores on ability within the 
i| Nes ex 63 m entire sample rather than within each sec- 
Rete E E tion, a procedure appropriate only if the 
«| SSA 23 hypothesis of homogeneity of slopes cannot 
m My s ie berejected. Analysis of covariance based on 
Spera EE all sections that had both final and ability 
x is alee 3 means failed to reject that hypothesis and 
E E E thus permitted use of the overall regression 
3 $ equation. 
4| gan] ¢ 5 "Table 7 presents the Kendall correlations 
AEE of PSAT-verbal ability, the final examina- 
era Sg tion, and the final examination adjusted for 
errr) SE PSAT-verbal, with student, peer, and self 
aiat EE! ratings on each of the six items and with 
ep FES es colleague and self rankings on “overall 
$$ teaching ability." 
3 ala $38 PE The only significant correlation between 
$ & B student ability scores and any rating or 
& 3 Aa ULES E D ranking indicated a negative relationship 
BT ó|^ nes ii between student PSAT ability and the in- 
S & 29 structors' own estimates of how much their 
5 3 l4 ses| $3 students learned. After Remmers, Martin, 
Ae IL. x and Elliott (1949) and Elliott (1950), this 
E AES CS Es correlation may simply indicate that in- 
b: t3 ode 33 Bes see the course as pitched to the less 
Ri as able student. 
= e| 888 A 1 The only significant correlation involving 
= 1 ot) gs the unadjusted final examination indicates 
* eas! E $ that colleagues see a tendency for the more 
d SUPE s (EDS approachable instructors to have students 
3 ET who perform better on the final. i 
S| |g.| 2S] 33° No student, self, or colleague rating or 
E 3 |! HE ranking was significantly correlated with 
2 E BET aa residualized achievement, although the signs 
io m. 5 $ for the colleague correlations were uniformly 
S £ negative. 
£ al $58 El g 
S UE ieee 
= 3 Discussion 
2 oto 2 2 
a m| oN] A 
5 E d $ The findings of the present study are 
eo | 38 limited by the size and composition of the 
r 8 g sia ET samples and by a certain amount of “noise” 
Ag 2| Ses) si unavoidable in complex field studies of this 
[GU n| wns $ Bea - "i 
Ex AEE) Se sort. Greater numbers of sections and in- 
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structors would have produced more repli- 
cable correlations, although the use of 
Kendall’s correlations does reduce some of 
the instability. Lack of absolute control 
over the conditions of measurement suggests 
the need for caution in the interpretation of 
these results; on the other hand, this very 
absence of control may make this study a 
more valid test of real-life evaluation pro- 
cedures. 

Under these limitations, the principal 
findings of this study were the following: 

1. Students, colleagues, and the instruc- 
tors themselves gave ratings that were fairly 
similar in mean, range, distribution, and 
skew, although colleagues tended to give the 
most favorable ratings, students the least 
favorable. This finding is somewhat at 
variance with Centra (1973) and may reflect 
greater cohesiveness among these instructors 
as well as the fact that all of them had had 
prior experience with student ratings. 

2. Individual student reliabilities were the 
same as or slightly higher than individual 
colleague reliabilities. Composite student 
reliabilities, prior to correction for differing 
numbers of students and colleagues, were 
substantially higher for student ratings than 
for colleague ratings. After the Spear- 
man-Brown correction, composite student 
reliabilities remained higher for “was ap- 
proachable" and “got students interested.” 
The uncorrected reliabilities are probably 
the more important, since rarely are there as 
many colleagues as students. These find- 
ings are quite consistent with Centra’s 
findings (Note 5), in which colleagues had 
had considerable Opportunity to observe the 
instructors in class. 

8. The student, colleague, and self ratings 
were quite good in convergent validity; the 
student and self ratings were considerably 
better than the colleague ratings in discri- 
minant validity. The most validly rated 
items, in these terms, were “raised chal- 
lenging questions” and “how much students 
learned,” followed by “overall teaching 

ability” and “was approachable.” Self and 
colleague rankings of overall teaching ability 
seemed to be somewhat better in convergent 
and discriminant validity than self and col- 
league ratings, probably because ranking 
forces discrimination among instructors and 
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provides an explicitly comparative “man 
to-man” scale (Guilford, 1954, p. 269). - [r 
their ratings and their rankings, the col 
leagues were disadvantaged by the lack of 
opportunity to observe in-class instruction: 
but because in actual practice colleagues 
rarely have the opportunity to observe one 
another in class, this disadvantage seems a 
realistic one. Given this disadvantage, itis 
interesting to note that some of the colleague 
ratings are convergent with self and studen 
ratings; this suggests that some in-class ins 
structor behaviors—for example, “raised 
challenging questions"—are predictable 
from out-of-class experiences. : 

4. No student, peer, or self rating of 
ranking on any item was significantly cor 
related with residualized achievement. 

These findings for self ratings and rank 
ings were unexpected because of intuitivi 
and empirical reservations about self eval: 
uations (e.g., Centra, 1973; Kulik & Kull 
1974). The findings for peer evaluation! 
were also unexpected, if for no other reason 
than prior findings that colleagues and st 
dents agree substantially on the attribute 
of good teachers (Hildebrand, Wilson, & 
Dienst, 1971; Lovell & Haner, 1955). Th 
findings for student ratings are somewhal 
inconsistent with the majority of recen 
studies, which have tended to find moderati 
positive correlations, especially with rating 
of overall teaching ability (e.g., Doyle & 
Whitely, 1974; Frey, 1973; Sullivan & 
Skanes, 1974; and Centra, Note 1). One 
reason for this inconsistency may be that 
study was performed in a different disc 
pline; the relationship between ratings and 
learning may vary from one discipline t0 
another. A second reason can be inferred 
from Sullivan and Skanes (1974), namely 
that the relationship between ratings and 
learning may vary with the “psychological 
distance” between raters and ratees, from 


or example, the Sullivan and Skanes 
study. 
pate dilemma posed by these findings i8 
that many of these ratings are reliable anc 
Valid in convergent/discriminant validi 
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terms but none of them correlates well with 
student learning. Extrapolating from 
Cronbach and Meehl (1955), this dilemma 
may indicate a deficiency in the ratings, a 
deficiency in the learning measure, a defi- 
ciency in the logic of or the statistical pro- 
cedures for relating ratings and learning, or 
deficiencies in all of these. 

Residualized achievement involves a final 
examination and a premeasure of ability, 
both of which in this study are fairly reliable 
and content-valid measures of important 
aspects of knowledge. However, these 
measures do not cover the full range of rele- 
vant knowledge nor do they deal with re- 
tention after the course, nor is the PSAT 
premeasure a sample from the same domain 
of knowledge as the final examination. 
Moreover, these problems of content are 
compounded by the problems of measuring 
gain (e.g., Glass, 1974; Harris, 1963). Con- 
sideration of criterion problems, then, may 
help to resolve the dilemma. 

Clarification of certain issues of logic and 
statistical procedure may also help. First, 
there is some question about whether 
learning really is an appropriate criterion for 
ratings, however intuitively appealing that 
may be (Doyle, 1975, p 65; Remmers & 
Brandenburg, 1927). In the same vein, 
ratings and learning may both be taken as 
independent indices of teaching quality, 
more appropriate for criterion development 
than for criterion validation (Blum & Nay- 
lor, 1968, Ch. 6). Second, standard corre- 
lation procedures may be testing inappro- 
priate kinds of relationships between ratings 
and learning. For example, to expect a 
positive correlation between rated “clarity” 
and “approachability” implies the hypoth- 
esis that the more clear or more approacha- 
ble instructors are, the more their students 
will learn. At least as plausible as such a 
linear relationship would be a curvilinear 
function: Perhaps instructors need only be 
clear enough or approachable enough to 
facilitate learning in their students; perhaps 
they can also be too clear or too approacha- 
ble. Linear correlations would not detect 
such nonlinear relationships. Similarly, the 
effects of factors that may be limiting the 
size of correlations between ratings and 
learning should also be examined, for ex- 
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ample, restriction of range in the subject 
samples, restriction of metric on one or both 
variables, and so forth (cf. McNemar, 
1969). 

Finally, student, peer, and self ratings may 
be deficient in that what they measure does 
not always bear directly on student learning. 
The extent to which this is a deficiency 
seems to depend to a considerable degree on 
one’s posture with respect to the issues 
raised in the paragraphs just preceding, on 
one’s judgment about the current quality of 
human perception as captured by ratings 
versus the current quality of learning as 
measured by tests, and even on one’s phi- 
losophy of education. In any event, all of 
these considerations argue for further work 
on ratings, on tests, and on the methods for 
their validation and development. 
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This study investigated the relationship between intelligence, field depen- 
dence, leadership, and self-concept. Eighty-eight sixth-grade boys were given 
a self-concept test and an embedded figures test, and their IQ scores were ob- 
tained. Ten groups of four were formed consisting of children with different 
combinations of high and low scores on the IQ and cognitive style measures, 
and the groups worked on an unstructured construction task. Following each 
session, members of the group rated each other on leadership, and the percent- 
age of speech time for each person was obtained from tape recordings. Field 
independence was related to intelligence and self-concept, and analytic 
subjects exhibited more leadership than global subjects, while IQ did not dif- 


ferentiate subjects on any variable. 


One of the most widely researched cogni- 
tive styles has been that defined by Witkin 
and his associates (Witkin et. al., 1954, 1962), 
who were concerned with the dimension of 
field independence - field dependence and 
developed the concept of psychological dif- 
ferentiation. Witkin has defined a field- 
independent individual as analytic and dif- 
ferentiated; while he states that a field- 
dependent person is global and less differ- 
entiated. The degree of field independence 
is determined by tests such as the Embedded 
Figures Test (EFT) and the rod-and-frame 
test, which require the subject to separate in 
some manner a figure from its background. 
Utilizing a variety of personality assessment 
techniques, Witkin concluded that adults 
who are field independent on the perceptual 
tests tend to be more active and self-assured 
and have greater self-esteem and self- 
awareness than field-dependent persons. 
Results with children showed a similar pat- 
tern: Field-independent children exhibited 
more impulse control, independence, and 
self-confidence. 


This report was based on a doctoral dissertation 
submitted to Ohio State University, Department of 
Psychology. The author wishes to thank Malcolm 
Helper for'his assistance throughout the project and in 
the preparation of the manuscript. 

Requests for reprints should be sent to David A. 
Hoffman, who is now at the Corrigan Mental Health 
ee 49 Hillside Street, Fall River, Massachusetts 


In theorizing about the relation between 
perception and personality, Witkin has 
concluded that field-independent individ- 
uals are active persons who function with 
relatively little support from the environ- 
ment, show initiative, and have the capacity 
to organize and master social and environ- 
mental forces. The field-dependent person, 
because of passivity, needs environmental 
support, does not initiate activity, and sub- 
mits to the forces of authority. 

The finding that field-independent indi- 
viduals are more active, self-confident, and 
have higher self-esteem suggests that they 
would be better able to take a leadership role 
than'those who are field dependent. Certain 
attributes of leaders have been found that 
are similar to those of field-independent 
persons. Reviews of the literature by Mann 
(1959) and Stodgdill (1948) indicate that it 
has been consistently found that the leader 
in a group is the most active person in it, and 
leaders have also been found to have a better 
self-concept than their followers. It has 
been demonstrated in a variety of situations 
that field-dependent persons are more sen- 
sitive to environmental influences than 
field-independent persons. Greene (1976) 
found that those who are field dependent are 
more likely to follow the recommendations 
of a counselor, while Konstadt and Forman 
(1965) showed that field-dependent children 
were more sensitive to disapproval than 
field-independent children. 
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A few studies have examined cognitive 
style and leadership, though not enough.has 
been done to allow definite conclusions. 
Carter and Nixon (1949) used the Gotts- 
chaldt figures, from which the EFT was 
adapted, and found that performance on this 
test was related to leadership on an intel- 
lectual task but not on other types of tasks. 
As intelligence was also related to test per- 
formance and was not controlled, it is not 
possible to conclude what accounted for the 
relationship, since in many of the studies 
reported by Mann and Stodgdill, intelligence 
was related to leadership. 

In much of the research done in this area, 
including that of Witkin, there has been no 
attempt to investigate whether the person- 
ality differences might be more related to 
intelligence than cognitive style. The issue 
does not seem to be whether the analytic 
individual is more intelligent, but rather 
whether the personality differences are in 

fact related to perceptual differences or are 
an artifact of the intellectual differences. 
The question is raised whether some per- 
sonality measures would correlate as well 
with IQ as with field independence. There 
is the additional possibility that some com- 
bination of the two types of measures may be 
an even more powerful predictor of the per- 
sonality variables. 

_There is substantial evidence that Wit- 
kin’s cognitive style variable is at least to 
Some degree related to intelligence, with 
field-independent persons tending to score 
better on intelligence tests than field-de- 
pendent persons. This was noted in Wit- 
kin's own research; and in Some cases, the 
correlations between the perceptual mea- 
sures and scores on the Wechsler intelligence 
scales were higher than those between the 
perceptual and personality measures. On 
the basis of factor analytic studies, it was 
concluded that field dependence was mainly 
related to certain subtests of the WISC and 
WAIS rather than to general intelligence, 

and Witkin (Witkin et al., 1962) feels that it 
is not accurate to say that field-independent 
persons are more intelligent than field de- 
pendent. Many other studies, however, 
have shown a significant relationship be- 
tween this cognitive style measure and a 
variety of intelligence tests (Spotts & 
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Mackler, 1967; Bigelow, 1971; Massari & 
Massari, 1973; Crandall & Lacey, 1972). 
There also seems to be suggestive evidence 
that some aspects of personality, and espe- 
cially self concept, are linked to intelligence, 
and these relationships are often of at least 
the same magnitude as many of those be- 
tween cognitive style and the personality 
measures used by Witkin (Brookover, 1962; 
Ringness, 1961; Sears, 1963; Piers & Harris, 
1964. 
The purpose of this study is to investigate 
the relationship between field independence, 
intelligence, self-concept, and leadership in 
boys. The goal is to determine whether 
leadership is related to field independence 
when intelligence is controlled and whether 
it relates more strongly to intelligence than 
toanalytic cognitive style. Leadership was 
measured in the context of a leaderless group 
activity, which provided the opportunity for 
individuals to take either a leadership role 
or a more passive role in the group. A self- 
concept test was also administered to de- 
termine whether this attribute might be a 
factor relating to performance differences of 
those of differing cognitive styles or levels of 
intelligence. 

It was hypothesized that both IQ and field 
independence would be positively related to 
leadership. It was also expected that those 
who were field independent and also scored 
well on the IQ measure would show the most 
leadership, those who were field dependent 
and low on the IQ measure would show the 
least leadership, while the groups of those 
who were field independent — low IQ and 
field dependent ~ high IQ would have in- 
termediate scores, One goal of the study 
was to determine whether IQ or cognitive 
style would show the stronger relationship 
to leadership and which factor would be 
stronger when an individual was high in one 
area and low in the other. 


Method 
Subjects 


The subjects were boys selected from the sixth grades 
of six schools in the Columbus, Ohio, public school 
System. All the schools were located in middle- to 
upper-middle-class neighborhoods, and the children as 


a group were of hi; intelli = 114.2, 
SD edo d) igh average intelligence (M IQ 
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p" Measures 


As a measure of cognitive style, a form of the Em- 
bedded Figures Test developed for groups was used. 
The Group Embedded Figures Test (GEFT) (Witkin, 
Oltman, Raskin, & Karp, 1971) consists of 18 embedded 
figures that are scored and seven practice problems. 
While this test was not developed for use with children, 
the authors reported that the test differentiated among 
10 year olds allowing a longer working time than for 
adults. 

To measure self-concept, the test developed by Piers 
and Harris (Piers, 1969) was selected. "This test consists 
of 60 first-person statements covering a variety of areas, 
to which the individual responds by agreement or dis- 
agreement. 

The intelligence measure used was the Short Form 
Test of Academic Aptitude (Sullivan et al., 1970), which 

is administered regularly to all children in this school 
system. This test consists of language and nonlanguage 
sections, though the nonlanguage tests included few 
items involving perceptual-motor skills. The children 
in this study had been given this test during the last half 
of their fifth-grade year. 

Several methods of assessing leadership in the group 
were used. The amount of verbalization of each group 
member was timed, and the percentage of participation 
of each person determined. Leadership ratings were 
collected from members of the groups using two pro- 
cedures. Each person was asked to select the group 
member including himself who gave the most leadership 
to the group, then the one who gave the least leadership, 
and finally the person he considered second in leader- 
ship. The remaining person was placed in the third 
position. To score this measure, a person received four 
points for the highest rating, three for second, two for 
third, and one for fourth. His point total from all group 
members gave his leadership rank score. The second 
measure was a leadership rating scale adapted from one 
developed by Bass and Norton (1951). This consisted 
of 10 statements on which each member of the group 
had to rate the others using a five-point scale. The 
points for the 10 items were added together, and an in- 
dividual's score on this scale was the sum of these rat- 
ings made by the other three group members. 


Procedure 


In each school, all of the sixth-grade boys whose 
parents signed permission slips for participation were 
brought together for group testing. The children were 
first given the GEFT, allowing 20 minutes for comple- 
tion, and then the self-concept test. The IQ scores were 
obtained from school records. 

Following the initial testing, subjects were selected 
to participate in the group task. For both the intelli- 
gence and cognitive-style tests, subjects in each school 
were divided into thirds. Children who scored in the 
top or bottom thirds on both tests were eligible for 
Participation in the groups. To form a group, one 
person was selected from each of the four combinations 
of high and low scores on the two tests, and they con- 
stituted one group. Ten groups were formed in this 
manner. Because the groups were formed within 
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schools, the members generally knew each other at least 
casually. 

The task that was selected for the group was a rela- 
tively unstructured construction problem. The ma- 
terials were the Fischertechnik building sets, which 
consist of materials such as building blocks, wheels, and 
gears from which a variety of items such as buildings, 
towers, and vehicles can be made. Members of the 
group were seated around a small table in such a way 
that all could have easy access to the materials. They 
were told that during the hour they had to work together 
to make something from the materials, using guidebooks 
or creating something original. Each child wore a mi-/ 
crophone, and the entire session was recorded on two 
stereophonic tape recorders, with each child recorded 
primarily on one track of tape for ease in timing ver- 
balization. Immediately after the session, the children 
were separated, and each was given a sheet that in- 
cluded the leadership rating scale and a section for 
ranking each member of the group on leadership. 


Results 


Correlational analyses were performed for 
the measures administered to all 88 subjects 
with the following results: IQ and field in- 
dependence were significantly related (r = 
.198, p < .05), as were self-concept and cog- 
nitive style (r = .386, p < .001), though the 
correlation between intelligence and self- 
concept was not significant. 

The mean scores on the IQ and cognitive 
style tests for the 40 subjects selected for 
participation in the small groups are pre- 
sented in Table 1. Table 2 lists the results 
of the analyses of variance for the leadership 
variables for these subjects. It is evident 
from the table that IQ was unrelated to 
scores on any of the leadership variables. 
The results for the main effects of cognitive 


Table 1 
1Q and Group Embedded Figures Test 


(GEFT) Scores for Each Subgroup for Those 
Participating in the Small-Group Task 


1Q GEFT 

Subgroup n M SD M SD 
Analytic- highIQ 10 1234 96 161 9 i 
Analytic-lowIQ 10 1036 93 15.1 14 
Global-highlQ 10 1232 113 75 17 


Global - low IQ 10 103.9 9.5 T2 24 


Analytic 20 1132 139 156 13 
Global 20 1136 142 74 20 
High IQ 20 1233 102 118 46 
Low IQ 20 1035 92 112 45 
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Analysis of Variance by IQ and the Group Embedded Figures Test (GEFT) for Leadership 


Variables for Subjects Participating in the Small-Group Task 


Combined 
High GEFT Low GEFT j 
1Q n E SD n M SD n M SD 
Leadership ranking* 
High 10 11.3 2.9 10 8.7 4.2 20 10.0 3.8 
Low 10 11.2 2.6 10 88 3.2 20 10.0 3.1 
Combined 20 11.2 2.7 20 8.8 3.6 
; Leadership scale M " 
High 10 92.1 15.2 10 824 93 20 BT:S 3.2.3 
Low 10 90.8 12.2 10 83.4 16.9 20 87.1 14.8 
Combined 20 915 13.4 20 82.9 13.2 
% speech timet : E 
High 10 27.8 8.3 10 24.4 5.9 20 26.1 1.2 
Low 10 28.6 7.3 10 19.0 6.7 20 23.8 8.4 
Combined 20 28.2 7.6 20 21.7 6.8 
= 1; for GEET, F(1) = 5.84, p = .021, MS = 62.50; for IQ x GEFT. F(1) = 01, ns. MS = .10. 
m id yi - i for GEFT, PD = 3.90, b = .056, MS = 731.01; for IQ X GEFT, F(1) = .07, ns, MS = 13.22. k 
* For IQ, F(1) = 1.05, ns, MS = 52.9; for GEFT, F(1) = 8.38, p = 006, MS = 422.5; for IQ X GEFT, F(1) = 1.91, ns, MS = 
96.1. 


style present a rather different picture. On 
both the leadership ranking and the leader- 
ship scale, the analytic subjects were rated 
significantly higher, and they also talked 
more than the global subjects. 


Discussion 


The results of this study tend to add sup- 
port to the notion that field independence is 
an attribute that relates to some aspects of 
overt behavior. Field-independent subjects 
were found to have a better self-concept and 
showed more leadership ina group situation 
than field dependent, while no relation was 
found between IQ and either variable. 
While the correlation between IQ and field 
independence did reach statistical signifi- 
cance, the relationship obviously is not a 

strong one and is somewhat lower than that 
found in most other studies. It is possible 
that the high correlations found with some 
tests were caused by the similarity between 
the kinds of questions they contain and those 
on the cognitive-style tests, while the test. 
used in this study had few questions of this 
type. 
The highly significant correlation found 
between cognitive style and self-concept 
seems to substantiate the conclusions drawn 


by Witkin using projective tests. This was 
also similar to results obtained by Pawelk- 
iewicz and McIntire (1975), who found that 
field-independent children scored higher 
than others on a self-concept test similar to 
the one used in this study. 

The expectation that those with an ana- 
lytic cognitive style would show more lead- 
ership in the small group sessions than global 
individuals was confirmed because on all 
three measures the analytic subjects scored 
significantly higher. The best measure 
seemed to be the forced-choice leadership 
tanking because this required the children 
to give different scores to each person in the 
group. The leadership scale, on the other 
hand, allowed the rater to give the same 
Score to all subjects, which was what some 
did. This reduced the size of the differences, 
though the result was still significant. 

The expectation that the high-IQ subjects 
would show more leadership than the low-IQ 
subjects was not confirmed, as it is quite 
clear that there was no difference on any 
measure that approached significance. This 
result was quite unexpected, as there had 
been previous indications that leadership 
tends to relate to intelligence. In addition; 
since the children generally knew each other 
and probably had some idea of how the other 
members of their group did academically, 
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which does relate to IQ, it seemed that the 
children might have had a tendency to rate 
the more able students as leaders. Clearly, 
this was not the case. 

An examination of the data shows that the 
analytic-high-IQ subjects had the highest 
mean scores on the two leadership measures, 
while the global-low-IQ subjects were lowest 
only on speech time. Since the differences 
between the highest or lowest and the next 
closest score were often very small, the re- 
sults regarding the combination of the two 
measures for predicting leadership are in- 
conclusive. It seems that being analytic is 
sufficient in this area and having a high IQ 
does not enhance a person's leadership 
ability. 

The differentiation hypothesis offers some 
possible reasons why individuals who are 
analytic in cognitive style have a more posi- 
tive self-concept and show more leadership 
than those who are global. Witkin feels that 
differentiation is reflective of a style of life; 
the differentiated person can organize, 
structure, and find the important aspects of 
a situation and takes an active approach 
toward dealing with the environment. If the 
field-independent person has the ability to 
analyze a situation better and react more 
appropriately to it, he is likely to have more 
success coping with his environment, which 
will improve his self-concept. The active 
approach he takes toward problem solving 
also enhances his chances for success. The 
child who is successful tends to get positive 
feedback from peers, parents, and teachers, 
which in turn raises his opinion of himself. 

It is evident that the ability to analyze and 
structure situations is an advantage when 
working on a problem-solving task. In 
groups such as those in this study, a person 
who had good organizational abilities could 
more easily establish himself as the leader. 
Because of the limited amount of time, de- 
cisions had to be made quickly, so that the 
person who could analyze and propose a 
course of action was the one who was likely 
to assert himself. Many have found that the 
most active person in a group generally is 
considered the leader, and the relationship 
between activity and field independence 
could also account for the analytic person 
exhibiting more leadership. The question 
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that is not answered is why the perceptual 
tests are related to the ability to analyze a 
situation and an active attitude. The only 
explanation at this point is that these at- 
tributes are necessary to be successful in 
tasks such as separating a figure from its 
embedding structure. 

It has been found that a positive self- 
concept is related to leadership in a group, 
and this may be one factor that mediates 
between cognitive style and leadership. The 
self-concept test was found to correlate sig- 
nificantly with both the leadership ranking 
(r = .404, p < .01) and the leadership scale 
(r = .285, p < .05); and when the self-con- 
cepts of those who received the highest and 
lowest scores on the leadership ranking were 
examined, there was further evidence of this 
relationship. All 10 of those who received 
the highest leadership rankings had a self- 
concept score equal to or greater than the 
mean of the whole sample. The scores of 
those receiving the lowest ranking present a 
somewhat different picture, with five below 
average and four above average, with one 
group having a tie for low score. The con- 
clusion that can be drawn is that while hav- 
ing a positive self-concept does not in itself 
indicate leadership ability, this attribute 
seems to be one prerequisite for leader- 
ship. 

The results of this study add support to 
the notion that psychological differentiation 
as measured perceptually does have validity 
as a construct that relates to certain aspects 
of personality. It also has provided some 
evidence that field independence is a concept 
that is distinctly different from general in- 
telligence. The findings indirectly provide 
support for the differences that Witkin 
found between analytic and global individ- 
uals on characteristics such as self-confi- 
dence, an active attitude, and being able to 
organize and structure an unstructured sit- 
uation. The study does not provide any 
additional data concerning the reasons that 
perception seems to be linked to personality 
characteristics. The question may also be 
raised whether the lack of differences be- 
tween high- and low-IQ groups as well as the 
low correlation between field independence 
and IQ would be found with a sample that 
was more representative of the intelligence 
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spectrum. Both these areas will require 
further research. 
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Strengthening Affective Components of Creativity 
in a College Course 


Gary A. Davis and Kay S. Bull 


University of Wisconsin—Madison 


This study was designed to evaluate the hypothesis that affective components 
of creativity (e.g., attitudes, values, interests, motivations) may be strength- 
ened in a university creativity course. A two-group, two-test design was used 
in which Class A received the How Do You Think (HDYT) test before the 
coursework and the Adjective Check List (ACL; scored for creativity) after the 
training. Students in Class B took the ACL before and the HDYT after the 
training. For each class, the after-training creativity test scores were signifi- 
cantly higher than the before-training scores of the other class. Both tests 
were moderately valid in predicting rated creativeness of student projects. 


The main purpose of this paper is to de- 
scribe a university course in creative think- 
ing that resulted in both objectively mea- 
surable increases in creativeness and in in- 
formal reports of personal benefit. The re- 
port bears on the issue of whether or not 
“creativeness” can be taught. The consid- 
erable disagreement and confusion with this 
problem stems from two different views of 
the personal capabilities necessary for 
creative productivity. On one hand are 
cognitive abilities, which for lack of some- 
thing better we can identify as the fluency, 
flexibility, and originality of Guilford (1967) 
and Torrance (1962). Like intelligence and 
most other cognitive abilities, such creative 
abilities no doubt have genetically-deter- 
mined upper limits. Training won't help 
very much. 

On the other hand are affective compo- 
nents of creativity, components that are in- 
creasingly recognized as absolutely essential 
to creative productivity. This affective side 
of creative thinking includes, for example, 
such personality-related matters as attitudes 
pertaining to open-mindedness and recep- 
tivity to new ideas, values relating to the 
importance of creativity to society and to 
personal development, and interests and 
motivations relating to becoming a more 
flexible, creative person. It is these and 
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other affective components of creativity that 
are trainable. Indeed, virtually every crea- 
tivity training course focuses strongly upon 
changing attitudes and interests in a more 
creative direction. It was therefore affective 
dimensions of creativity that were taught 
and evaluated in the present research. 


Method 


Subjects 


The subjects were students in two undergraduate, 
5-week one-credit creativity classes taught by the first 
author. The courses were taught during the same se- 
mester. ‘The exact number of students completing the 
two tests and the criterion projects varied slightly due 
to late registrations, absences, or tardiness in completing 
the projects. However, Class A contained about 87 
students (57 female, 30 male) and Class B about 60 
students (42 female, 18 male). 


Instruments 


The How Do You Think (HDYT; Davis, 1975) test is 
comprised of 102 5-point rating-scale items assessing 
such traits as curiosity, self-confidence, artistic and 
aesthetic interests, risk-taking, energy level, sensa- 
tion-seeking, sense of humor, self-rated originality and 
imaginativeness, openness to new experiences, plus 
biographical information pertaining to past hobbies and 
creative activities. In earlier studies (e.g., Davis, 1975) 
the test was shown to have high internal reliability 
(Hoyt rs from .91 to .95) and acceptable validity for 
predicting the rated creativeness of student projects. 

The Adjective Check List (ACL; Gough, 1952) is alist 
of 300 adjectives ordered alphabetically from absent- 
minded to zany. Subjects are asked simply to mark 
which adjectives apply to them. Domino’s (1970) cre- 
ativity scale, a subset of 59 adjectives, was validated 
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with male college students using faculty nominations 
as a validating criterion. All 59 items are indicative, 
that is, positively related to creativeness (e.g., active, 
adventurous, artistic, clever, confident, curious, en- 
ergetic, humorous, imaginative). To control for total 
number of adjectives checked, Domino devised a T- 
score transformation key that he generously supplied 
to the authors. 


Creativity Training 


"The 5-week course provided a mix of academic ma- 
terial and hands-on involvement activities. The course 
covered such topics as creative and problem-solving 
processes, the creative personality, reviews of creativity 
training materials and strategies, creativity tests, the- 
ories of creativity, creative dramatics, creative thinking 
techniques, brainstorming, and others. Throughout, 
active participation was elicited as much as possible. 
Thus, students took creativity tests, participated in 
individual and group exercises and demonstrations, 
practiced brainstorming and other creative thinking 
techniques, and participated in lively creative dramatics 
sessions. 

The critical affective emphasis was explicit in such 
topics as the creative personality and brai ing and 
was implicit in the demonstrations of creativity mate- 
tials and tests and in the creative dramatics sessions. 
"That is, the class talked about appropriate creative at- 
titudes and predispositions, and participated in ac- 
tivities requiring these characteristics. 


Creativity Criterion 


As part of the course requirements, each student 
produced (a) an art or handicraft project, (b) creative 
writing (poetry or short story), (c) ideas for two inven- 
tions, and (d) ideas for a creative teaching method. The 
projects were turned in 1 month after the end of class. 
The Projects were independently rated by each author 
on a 7-point “creativeness” scale, using pluses (+) and 
minuses (—) to increase accuracy. In practice, ratings 
of 1 and 2 were not given, and so the final scale con- 
taney 8 ee that a rating of 3 = 1,3+ 22,4— 
~ 0... 7— = 12, and 7 = 13. The projects gener; 
were not difficult to rate. Some sheen fist oe 

creative ability, whereas others demonstrated consid- 
erable creative experience and talent. Interrater reli- 


Table 1 


Class A (Trained) 
Sex M 
Males 26 49.47 8.86 
Females 44 49.25 10.47 
Combined 70 49.44 


* p € .05, one-tailed. 
** p < .025, one-tailed, 
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Mean Adjective Check List Creativity Scores for Class A (Trained) and Class B (Control) 


ability was .67 over all subjects. A single avei g 
rating for each subject was used in all analyses. 


Design 


Class A took the HDYT on the first day of class 
the ACL on the last day. Conversely, Class B tool 
ACL on the first day and the HDYT on the last di 
"Thus the aftertraining HDYT scores of subjects in Clat 
B were compared with the before-training HDYT sco 
of subjects in Class A. Similarly, the aftertrainii 
ACL-Creativity (ACL-Cr) scores of subjects in Ch 
were compared with the before-training ACL-C 
of the subjects in Class B. This design nti 
replicated the training and evaluation procedures, 
different subjects and a different assessment instru 
ment. 
Pearson rs were used to evaluate relationships amo 
HDYT scores, ACL-Cr scores, and project ratings. 


Results and Discussion 


Consistent with earlier studies, the Hoy 
internal reliability for HDYT was .92, for 
ACL-Cr (using transformed scores) it w 
91 


Tables 1 and 2 summarize the main da 
for the ACL-Cr and for the HDYT, res 
tively. For both groups and, therefore, bot 
tests, the trained subjects scored signif 
cantly higher than their respective before 
training controls. Also, for both groups ti 
training effect was stronger for females thi 
for males. Generally, the ACL-Cr app 
to be more sensitive to the creati 
training, which may be due to the large 
number of biographical items in the HD 
Presumably, such items are less suscepti 
to creativity training. 

A frequency count was made of test 
whose average rating favored the group ¥ 
the training. For the HDYT, subjects in thi 
trained Class B scored higher than the 
trol Class A subjects on 73 of the 102 item! 


.901 
1.974* 


2.326** am 


AFFECT AND CREATIVITY 


835 


Table 2 
Mean How Do You Think Scores for Class B (Trained) and Class A (Control) 
Class B (Trained) Class A (Control) 

Sex n M SD n M SD t 
Males 11 337.83 35.73 28 327.07 4.242 7142 
Females 41 338.10 50.10 57 322.42 47.89 1.568* 

Combined 52 338.09 47.19 85 323.95 45,96 1.730** 


* p <.10, one-tailed. 
** p <.05, one-tailed. 


(X? = 18.98, p < .01). For the ACL, the 
trained Class A scored higher than the con- 
trol Class B on a full 55 of the 59 adjectives 
in the creativity scale, again reflecting the 
greater sensitivity of the ACL-Cr to training 
effects. 

With the ACL-Cr, a series of two-group 
univariate F tests showed that mean differ- 
ences in item ratings reached statistical sig- 
nificance (.05 level) for 22 items, all favoring 
the trained group. These were mainly in the 
areas of self-rated originality (e.g., original, 
clever, unconventional), self-confidence 
(e.g., confident, independent, assertive), 
adventurousness and impulsivity (e.g., ad- 
venturous, ambitious, impulsive, sponta- 
neous), and self-rated intelligence (e.g., in- 
telligent, clear-thinking, rational). Other 
adjectives that showed a significant training 
effect were humorous, sensitive, idealistic, 
reflective, and complicated. 

With the HDYT test, just 12 items showed 
significant group differences in mean ratings, 
all favoring the trained group. These 
seemed to measure traits of curiosity (e.g., “I 
think old attics are dirty and uninteresting”; 
“I think it’s fun to explore museums") and, 


Table 3 
Intercorrelations Among How Do You Think 
(HDYT), Adjective Checklist-Creativity 


(ACL-Cr), and Project Rating Scores 


Class A (n = 63) Class B (n = 35) 


Measure ACL-Cr HDYT  ACL-Cr HDYT 
HDYT 643*** .646*** 
Project 
rating .340***  .352*** — .323* .999** 
* p <.05. 
** p € .025. 


*** p < 005. 


similar to results with the ACL, adventur- 
ousness (e.g., “I enjoy the confusion of a big 
city"; ^I would like to live and work in a 
foreign country") and emotional sensitivity 
(e.g., “I am a sensitive person"; “I am 
moody"). 

Using only subjects for whom all three 
scores (HDYT, ACL-Cr, and creativity 
project ratings) were available, Pearson rs 
were examined for the two classes for males 
and females separately and combined. 
Since no noteworthy sex differences were 
apparent, the combined-sex correlations for 
Class A and Class B appear in Table3. The 
results for the two classes were clearly very 
similar. The relatively high correlations 
between ACL-Cr scores and HDYT scores 
are due in large part to the substantial 
common content of the two instruments. 
Also, both tests appear to be moderately 
valid as predictors of the rated creativeness 
of student projects. 

The research data suggest that the affec- 
tively-oriented content of the creativity 
course did result in increases in the sorts of 
affective traits—attitudes, motivations, 
values, and so on—necessary for creative 
productivity. Of course, the present design 
assumes that the increases in creativity 
scores are not due to transfer effects from the 
pretest to the posttest, nor to responding on 
a social desirability basis, that is *knowing 
the right answer." The objective evidence 
of a positive training effect is supported by 
informal comments made by many students 
as to the personal benefits derived from the 
course. Also, a 1978 student association 
publication listed the course as one “option 
to boredom” that was described as “re- 
freshing and stimulating . . . (and) promotes 
individuality and personal expression." 
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It would appear then, that some affective 
characteristics associated with creativeness 
can be strengthened to a measurable degree 
by training in a college creativity course. 
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Individual Differences in Distraction by Pictures 
in a Reading Situation 


Dale M. Willows 
University of Waterloo, Waterloo, Canada 


This research was designed to determine whether the relative difficulty of the 
words being read influences the extent to which good, normal, and poor read- 
ers’ decoding is susceptible to influence by adjunct pictures. Third graders 
were required to read easy, moderate, and difficult one-syllable nouns under 
each of three conditions: a control condition with no pictures, an identifying- 
picture condition, and an unrelated-picture condition. The results showed 
that the reading performance of poor readers as a group was influenced by pic- 
tures under all conditions, while that of normal and good readers was much 
less affected by them. There were, however, marked individual differences 
among the poor readers in the degree of their susceptibility to distraction. 


For many years illustrations have been 
included in the primers used for teaching 
reading on the assumption that pictures fa- 
cilitate children’s acquisition of reading skill 
(Huey, 1908/1968). The amount and sal- 
ience of the artwork in children’s readers and 
other school books has increased drastically 
in the past decade. Thus far, however, few 
researchers have asked, “Does all of this 
artwork help in the business of learning to 
read?” (Gibson & Levin, 1975; Samuels, 
1970). 

On the basis of recent work examining the 
effect of pictures on decoding (Willows, 
1978), there is some suggestion that where 
learning to read is concerned, a picture may 
not always be worth a thousand words. In 
aseries of two studies, Willows (1978) dem- 
onstrated that when children in second and 
third grade read a set of familiar words, their 
performance was less efficient in terms of 
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both time and errors if there were pictures in 
peripheral vision. An additional finding was 
that the interference caused by pictures was 
greater for less skilled readers. Although 
Willows’s research clearly demonstrated that 
pictures interfered with performance on a 
simple decoding task, the fact that the extent 
of the interference was inversely related to 
reading ability is amenable to more than one 
interpretation: (a) It may be that there are 
individual differences among children in 
susceptibility to visual distraction while they 
are reading. Individuals who are particu- 
larly susceptible to distraction by informa- 
tion in peripheral vision may become poor 
readers as a direct result of their distracti- 
bility. (b) On the other hand, whether or 
not pictures are distracting in a given situa- 
tion may simply be a function of the relative 
difficulty of the particular reading materials 
for each individual. Although the words in 
both of Willows’s studies were decoded with 
a high degree of accuracy by all children, the 
attentional requirement of decoding the 
words may have been quite different for 
children at different levels of reading skill. 
According to a recent theory of automatic 
information processing in reading (LaBerge 
& Samuels, 1974), decoding of a set of words 
can, with practice, become automatized so 
that word recognition can occur without at- 
tention. When individuals are required to 
read words they have already automatized, 
they may be able to process extraneous pic- 
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torial information simultaneously with no 
performance decrement on the decoding 
task. In Willows’s study, the more skilled 
readers may have been decoding the words 
automatically; while for the less skilled 
readers, decoding the same set of words may 
have required the services of attention. 

If the former interpretation of Willows's 
results is correct, then children who are dis- 
tracted by pictures while reading a difficult 
set of words would also suffer interference 
while reading words that are considerably 
easier for them. If the latter interpretation 
is more accurate, then children’s decoding 
should be little affected by pictures in the 
periphery when the words they are reading 
are easy for them, and their performance 
should be vulnerable to distraction when 
they are decoding words that are more dif- 
ficult for them. The present research was 
designed to determine which of these two 
alternative interpretations provides a better 
explanation of the results of the earlier 
studies. 

There were three phases in this research. 
In the first, children were group tested in 
their classrooms on standardized tests of 
reading comprehension and nonverbal in- 
telligence, In the second phase, the diffi- 
culty level of a large number of nouns was 
assessed by presenting them individually to 
each child and recording reaction times and 
errors. Finally, in the third phase, words 
tanked as easy, moderate, and difficult on 
the basis of reaction time data were pre- 
sented alone and in two picture-word con- 


ditions to children of good, normal, and poor 
reading ability. 


Phase 1 


This initial phase simply involved the 
group testing of a large sample of children on 
standardized tests of reading achievement 
and intelligence. 


Method 


Subjects. Letters requestin, icipation i 
g participation i 
of “factors which influence " ved 


reading" w 

with all of the children in five third grade dae 
located in three middle-class parochial schools. Allof 
the 91 children (48 boys and 43 girls) whose parents 
omen were included in this initial phase of the 
study. 
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Procedure. Each classroom of children was group 
tested on standardized tests of reading achievement and 
1Q on two consecutive days. On the first group testing 
day, Form A (Scale 2) of the Institute for Personality 
and Ability Testing (IPAT) Culture Fair Intelligence 
Test (Cattell & Cattell, 1960) was administered, fol- 
lowed by the Comprehension subtest of Form 1 (Survey 
C) of the Gates-MacGinitie Reading Tests (Gates & 
MacGinitie, 1965). On the second group testing day, 
Form B (Scale 2) of the IPAT Culture Fair Intelligence 


. Test was administered followed by the Comprehension 


subtest of Form 2 (Survey C) of the Gates- MacGinitie 
Reading Tests. All of the standardized tests were ad- 
ministered by the same female experimenter, an expe- 
rienced classroom teacher, who also conducted the in- 
dividual testing for the picture-word task. Both tests 
were scored according to the instructions in the manu- 
als. The reading comprehension raw scores were con- 
verted to standard scores using the tables that accom- 
panied the test. Mean IQ and reading comprehension 
Scores were calculated by averaging the two scores ob- 
tained by each child. 


Phase 2 


The objective of this phase of the study - 


was to obtain a set of words that could be 
decoded with a high degree of accuracy by 
both good and poor readers, but which would 
include a range of difficulty. 


Method 


i Subjects. The data from 74 children were included 
in this phase. The children who participated in the first 
phase but whose data were not included in the second 
phase were 4 children who did not speak English as their 
first language, 8 children who were absent on 1 or more 
testing days in Phase 1 or 2, and 5 children who were $0 
deficient in their decoding skills that they could not 
complete the Phase 2 task. 

Procedure. The word lists accompanying the reading 
Series used in the three schools were consulted to select 
a set of 160 first- and second-grade nouns with which 
XM third graders should have been familiar. D 

Se nouns was printed in primary type and photo- 
graphed. The resulting iios ucro S ademly assigned 
to 5 groups of 32 slides each. The order of presentation 
of these 5 sets of words was counterbalanced cross 
subjects. The children were tested individually in â 
small quiet room in the school. Each slide was pre- 
sented individually on a rearview projection screen 
feet (91 m) from the child. From that viewing distance: 
the words were easily readable and appeared a little 
larger than the print the children were used to seeing 
in their readers at school. The instructions were simply 
tà read each word as soon as it came onto the screen- 
The slides Were presented one at a time by means of 4 
carousel projector triggered by an automatic timer- 
reaction time key was employed to record response la- 
tency toeach word. The slide remained on the scree? 
Until the child responded. After the slide had been shut 
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s off by the reaction time key, there was a 5-sec pause; the 
experimenter then said ““Ready?”; and the next word 
appeared on the screen 3 sec later. Between each tray 
of 32 slides, there was a break during which the exper- 
imenter chatted with the child about topics that were 
unrelated to the task. 

The reaction time key was pressed by the experi- 
menter as soon as the child started to say a word. The 
experimenter then recorded the child's response. Al- 
though a voice key would have been more elegant, our 
pilot testing indicated that more than 2096 of the re- 
sponses would be lost due to heavy breathing, coughing, 
and whispered sounding out of words before actual re- 
sponding. In order to be sure that the reaction time of 
the experimenter was a reliable measure of the child's 
responding, the reaction time measure was also recorded 
byasecond experimenter independently for the first 30 
children tested. Thus, there were a total of 4,800 re- 
sponses recorded by both experimenters. A Pearson 
correlation of the two experimenters' reaction times 
produced a coefficient of r = .96. Since the reaction 
time measure was found to be very reliable, all calcu- 
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lations were done using the scores of the principal ex- 
perimenter only. 


Results 


The 74 children each responded to 160 
words. Of their 11,840 responses, 373 were 
errors. The number of errors per word (out 
of 74) ranged from 0 to 27. Reaction times 
were available for the 11,467 responses that 
were not errors. A small proportion of these 
reaction times (48 of them) were markedly 
longer than the rest. Since even for the most 
difficult word, the mean reaction time was 
just over 2 sec, a score of 3 sec was arbitrarily 
assigned to replace the 48 extremely long 
reaction times. This procedure was followed 
to avoid inflating the mean reaction time 


Relative Difficulty Level of Words Based on Reaction Times of 74 Third Graders 


d Easy - Moderate Difficult 
M M M 
Word latency SD Errors Word latency SD Errors Word latency SD Errors 
Set 1 
boat 4.0691 .2064 1 ring 11463 2464 1 mask 1.2825  .3701 0 
sun 1.0766 .1530 0 ham 11690  .2961 0 train 13887 .3572 0 
egg 1.0944 .1779 © star 1.1742 .2992 2 chair 1.282 .3169 0 
jet 1.1069 .2303 1 log 1.1384  .2009 1 foot 12493 .3816 4 
cat 10069 .2270 0 cup 11494  .2783 0 — bug 1.0600 4625 1 
man ^ 1.0488 .1584 0 cake 1.1454 4.2765 0 goat 1.2876 -3663 2 
dog 10564 .1920 0 shoe 1.1578 .2729 0 gate 1.2502  .3912 2 
1 bed 10302 .1957 0 head 1.1551 .2897 0 rake 12553 .3978 | 3 
rock 1.0793 .1736 1 pie 11386 3118 2 goose 1.3005 .2878 2 
mil ^ 1.0954 .1988 0 tent 1.1595 2617 1 spoon 12655 .2682 0 
boy 10701 .2643 0 hen 11710 .2864 0 pig 12458 .3031 0 
book ^ 1.0493 .1814 2 cow 11459 2164 2 ball 1.2406 —.3420 1 
hat 1.0402 .1935 0 . fire 11412 .1840 0 drum 1.2718 4207 3 
box 1.0932 .1974 0 eye 11416 2918 0 bread 1.3011 .4598 2 
fish 10967 .2037 0 doll 1.1449 .2857 0 frog 12627 .3445 0 
Set 2 
moon 1.1348 .2797 0 bus 1.2117 .2523 0 chain 14342 4715 5 
fox 1.014 .2270 0 lamb 12185 .3632 4 sled 14027  .4673 1 
car 1.0052 .2278 0 pool 1.2127 .2207 3 bowl 15304 4976 5 
bell 11201 2849 (0 dish 11850 3167 0 grapes 1.9242 3095 — 3 
4. leaf 11294 2504 0 nails 1.2042 .2193 2 fence 13717 4846 6 
tree 1.0998 .2412 0 key 11926  .3351 1 church 13132  .3341 3 
bird 1119] 3178 — 0 shell 11952 .2544 2 plate 13947 4949 4 
ear 1.1490 2425 0 corn 12224 .3046 0 kite 1.3689  .4337 5 
horse 1.1299 9759 0 duck 1.1781 .3745 0 roof 13417 .4281 1 
girl 11185 1566 0 leg 12126 .3242 1 saw 13985 4594 0 
Mouse 1.1309 .2242 0 desk 1.2430  .3626 1 tire 14266 .5072 6 
door 1.0885 .1776 0 shirt 12217 2919 1 clown 1.3265  .4209 1 
bike — 1.1339  .2471 0 dress 1.2089  .3023 1 skate 1.3633  .4720 1 
ship 11219 2053 0 truck 1.1759  .2905 0 ghost 14872  .5452 1 
cap 11317 .3011 2 frog 11954 .3570 0 coat 1.3095 4234 1 


Note. There were two sets (Sets 1 and 2 in the table) of 15 words each at the three difficulty levels. 
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scores for the words on which the unrep- 
resentative reaction times occurred. 

All of the 160 words were ranked from 
easiest to most difficult on the basis of the 
mean reaction time scores. A similar rank- 
ing was done for the decoding errors on each 
words. From the 160 words, 3 subsets of 30 
words each were selected as easy, moderate, 
and difficult to decode on the basis of the 
reaction time and error scores. These words 
were all one-syllable nouns on which few 
children made decoding errors. In Table 1, 
the means and standard deviations of the 
reaction times are shown for each of the 
three difficulty levels of the selected words. 
The numbers of children (out of 74) who 
made decoding errors on the words are also 
shown. 

In order to determine whether these 
rankings were representative of children 
ranging in absolute reading ability, the 
reading achievement standard scores of the 
74 children were ranked, and the children 

were divided using a median split into good 
and poor readers (without regard for IQ). 
The mean reaction times to the 160 words 
were assigned ranks from shortest to longest 
(1-160) for the good and poor reader groups. 
Then, for the 90 words selected as easy, 
moderate, and difficult, a Pearson correla- 
tion on the ranks for good and poor readers 
produced a coefficient of r(88) = .85, p < .01. 
Although this correlation is less than perfect, 
it certainly indicates that for the most part, 
good readers and poor readers had difficulty 
with the same words. The groupings of 
words into easy, moderate, and difficult sets 
were thus appropriate for all children. Al- 
though a difficult word was by no means 
equally difficult for a good and for a poor 
reader, the set of difficult words was rela- 
tively more difficult for all children than 
those designated moderate or easy. 


Phase 3 


There were three goals in this major phase 
of the study: The first was to compare the 
susceptibility to influence by adjunct pic- 
tures of children at three different levels of 
reading skill decoding a set of familiar words, 
The second was to examine the influence of 
word difficulty on the extent to which ad- 
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junct pictures affect decoding. The third 
was to ascertain whether the effects of ad- 
junct pictures on decoding are dependent on 
the picture-word congruity. 


Method 


Subjects. For this final phase of the study, the pic- 
ture-word phase, 48 children were chosen from the 74 
who took part in the second phase. There were 16 good, 
16 normal, and 16 poor readers (8 boys and girls in each 
group), selected on the basis of their reading compre- 
hension and intelligence test scores as explained in the 
next section. These children did not have any known 
problems of vision or hearing nor any clinical symptoms 
of perceptual handicap. 

Selection of subjects. The test-retest reliability of 
the scores on the two forms of the IPAT Culture Fair 
Intelligence Test was r(72) = .61, p <.01, and on the two 
forms of the Gates-MacGinitie Comprehension tests 
was r(72) = .73, p <.01. A Pearson correlation of the 
mean reading comprehension and IQ scores yielded a 
coefficient of r(72) = .46, p < .01. Applying the cor- 
rection for attenuation due to the low reliability of the 
reading and IQ scores (Ferguson, 1971), this correlation 
increased tor = .69. The IPAT is entirely a nonverbal 
1Q test, so a correlation between reading achievement 
and a verbal measure of IQ would undoubtedly have 
been still higher. 

Because measures of reading ability and IQ share a 
great deal of common variance (nearly 50% on the tests 
used here), it was deemed important to control for IQ 
when grouping children on reading ability. Therefore, 
reading ability was operationally defined in terms of 
both reading achievement and IQ scores. Following # 
method developed previously (Willows, 1974), each 
child's IQ score was used as a predictor of his or her 
reading potential, and the child’s score on the reading 
test was taken as an index of reading performance. The 
size and direction of the discrepancy between predicte 
and Obtained scores determined the assignment of 
children to good, normal, and poor reading groups. 

Since the test-retest reliabilities on IQ and reading 
were not very high, all subjects whose IQ or reading 
comprehension Scores differed by one standard devia 
tion or more were dropped from the sample (a total of 
12 children). There remained 62 children who ha 
reliable scores on both measures. Using the scores 0 
these children, an expected reading level for each I 
value was computed from the regression of reading 
achievement on 1Q (following the procedure outline by 
Ferguson, 1971). ‘The discrepancy between obtained 
Gates-MacGinitie reading achievement and predicted 
reading ability based on general intelligence was C27 
culated for every child. A child was arbitrarily desit 
nated a poor reader if his or her obtained reading score 
was atleast one half of one SD below the predicted level. 
A good reader, on the other hand, was anyone whose 
reading achievement. exceeded the predicted level by 
atleast one half of one SD. Fora given IQ level, then: 
there was at least one SD (eight points) between a 89° 
and poor reader. Anyone who obtained a discrepant! 


Score smaller than plus or minus two was designate 
normal reader. 
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Table 2 
Means and Standard Deviations on Age, IQ, 
and Reading for the Good, Normal, and Poor 


Reader Groups 


Gates- 
Chrono- MacGinitie 
logical Reading 
age Compre- 
(in IPAT hension 
Group months _1Q* test 
ve Good readers 
M 104.3 111.7 58.7 
SD 3.7 8.4 4.5 
Normal readers 
M 100.9 111.8 51.3 
SD 3.5 6.0 2.7 
Poor readers 
103.2 110.4 45.1 
SD 3.6 10.2 3.6 


Note. For each reading group, n = 16. 


^ TPAT IQ represents the score received on the Institute for 
Personality and Ability Testing Culture Fair Intelligence 


Test. 


The reading ability groups thus formed differ from 
those in most other studies in the literature. The more 
usual practice is for the reading scores alone to serve as 
the criterion for group assignment. In this research, the 
reading ability designation is a relative one that de- 
pends both on the reading score and on the IQ score. 
That is, in this research, a child whose IQ is 95 and 
whose reading score is 50 would be considered a rela- 
tively good reader, since his or her reading score is 

- higher than that of most other children of similar IQ. 
à Conversely, a child with an IQ of 120 and a reading score 
of 50 would be considered a relatively poor reader. Had 
these two individuals been assigned to reading ability 
groups on the basis of their reading scores alone, they 
would both have been assigned to the same group. 
When the joint criterion of IQ and reading is employed, 
the resulting reading ability groups overlap to some 
extent in terms of absolute reading scores. They differ 
substantially, however, in terms of relative reading 
ability. 

In order to insure that the groups of relatively good, 
normal, and poor readers were approximately equiva- 
lent in terms of age and IQ, the children in these groups 

yg Were selected at random from the lists of potential 
Subjects, with the constraint that no child who was ex- 
tremely discrepant in age or IQ was included in one 
reading ability group unless there were children with 
Similar scores to include in the other groups. Table 2 
Shows the means and standard deviations of the three 
groups on age, IQ, and reading comprehension. There 
were no significant differences between the groups in 
|» ageorIQ(F <1). The groups did, however, differ sig- 
nificantly in their reading comprehension scores, F(2, 
45) = 52.07, p < .01. It should also be noted that the 
groups were not, as they may appear to be, substantially 
3 above average in intelligence. On the basis of the scores 
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of more than 2,000 children tested on the IPAT Culture 
Fair Intelligence Test in this area, the mean and stan- 
dard deviation for the test are 108 and 12, respectively. 
Thus, the groups in this study were only slightly above 
average. 

Test materials. 'Two pages of 15 words each were 
prepared at the three levels of word difficulty (easy, 
moderate, and difficult) established in Phase 2. Under 
the control condition, each of the resulting 6 pages were 
simply printed in lowercase primary type in 3 rows of 
5 words each on 8!/ X 11 inch (22 X 28 cm) pages. In 
an identifying-picture condition, the same 6 sets of 15 
words were used, but the order of the words on each 
page was randomized and 4 inch (1.27 cm) above each 
of the words was a picture that identified it. A line 
drawing of a coat, for example, was placed above the 
word coat. In an unrelated-picture condition, the same 
6 sets of words and pictures as in the identifying-picture 
condition were used, but the words and pictures on each 
page were rearranged so that each word was printed 
below a picture that was not an obvious associate of it. 
The word coat, for example, was printed below a picture 
of a bunch of grapes. 

Procedure. Each child was tested individually in an 
empty classroom in the school. The 18 pages of test 
materials (2 sheets from each of the 3 difficulty levels 
under each of the 3 picture-word conditions) were ad- 
ministered to all of the 48 children, with each subject 
serving as his or her own control. The presentation 
order of the 18 sheets was individually randomized for 
each child. The testing session lasted for approxi- 
mately 25 minutes. 

‘The session began with a practice period during which 
each child was taught through modeling and instruction 
to read aloud the 15 words on two specially prepared 
practice pages quickly and accurately. The child was 
then told that sometimes the experimenter would try 
“to trick" him or her by putting some pictures on the 
page but not to pay any attention to the pictures and 
simply to read the words aloud quickly and without 
error as before. The child was then given two addi- 
tional pages of practice with unrelated pictures and 
words on the pages. None of the words or pictures used 
in the practice session were the same as those used in the 
actual test materials. It was clear that all of the chil- 
dren understood that the pictures were there to distract 
them and that they should try to ignore them. As soon 
as the practice trials were completed, the test materials 
were introduced without a break in the procedures. On 
each trial, 1 of the 18 pages was placed face down on the 
table in front of the child, with the words “get ready." 
The page was then turned face up, with the instruction 
“begin.” 

A tapatahi was used to measure the time (to the 
nearest second) that it took each child to read the 15 
words ona page. Allsessions were tape recorded, and 
reading errors were scores later from the tapes. An 
error was scored when a child said a nonword, said a 
different word than the one on the page, or had to be 
told a word after 10 sec had elapsed. 

Design. There were three major independent vari- 
ables: reading ability (good, normal, and poor), word 
difficulty (easy, moderate, and difficult), and picture- 
word condition (no pictures, identifying pictures, and 
unrelated pictures). Thus, the design wasa3 X 3X 3 
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Table 3 
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Mean Times (in sec) and Standard Deviations to Read 30 Words at Each Difficulty Level 


Picture-word condition 


No pictures Identifying pictures Unrelated pictures 
Grou; M SD M SD M SD 
(Rie aera) 
car 14.3 2.17 16.0 2.76 18.0 4.08 
Moderate 17.3 2.39 16.4 2.48 17.8 3.35 
Difficult 17.5 3.69 17.1 3.08 20.3 4.40 
d 
em UT 17.1 3.94 17.5 4.02 21.1 4.69 
Moderate 19.4 4.68 18.9 3.29 21.2 5.68 
Difficult 20.4 6.31 19.9 4.55 24.4 8.65 
Poor readers 
Easy 18.7 2.86 20.4 3.91 25.8 7.93 
Moderate 22.5 5.25 21.2 4.94 28.5 11.88 
Difficult 26.3 8.37 23.2 4.26 32.6 12.57 


factorial design with one between- and two within- 
subjects variables. 


General Results 


Initially a2X 3X3X3 analysis of vari- 
ance (ANOVA) was performed on both the 
time and errors measures, with sex of child 
as an additional between-subjects variable. 
"The results of this analysis showed that there 
were no sex differences (neither main effects 
nor interactions) on either measure (F <1). 
Therefore, sex of child was not included in 
any subsequent analyses 


Reading Time 


The results of the ANOVA 
time measure were clear-cut and significant. 
Table 3 summarizes the means and standard 
deviations for each reading ability group at 
the three levels of difficulty. As one would 
expect, there was a main effect of reading 
ability, F(2, 45) = 9.46, p < .01. Better 
readers took less time to decode. the words. 

"There was also a main effect of the picture— 
word condition, F(2, 90) = 36.67, p < .01, 
and an interaction of reading ability with the 
picture-word treatments, F(4, 90) = 4.35, p 
<.01. These effects are examined in more 
detail later in the subsections on identifying 
pictures and unrelated pictures, 

The word difficulty variable also produced 
a main effect, F(2, 90) = 49.59, p < .01, as 
well as a significant interaction with reading 


on the reading 


ability, F(4, 90) = 4.32, p <.01. The reading 
times increased across the three levels of 
word difficulty; and for poor readers, this 
increase in reading time was more marked. 

Finally, there was an interaction between 
the picture-word conditions and the word 
difficulty dimension, F(4, 180) = 5.14, p € 
01. This interaction resulted from the fact 
that while the unrelated pictures always 
produced interference relative to the no- 
picture condition, the identifying pictures 
slowed the children down when they were 
decoding the easiest words but facilitated 
their performance when they were reading 
more difficult words. 

In order to elucidate the picture-word 
effects, difference scores were calculate 
between the two picture-word conditions 
and the no-picture control condition. These 
difference scores indicate the extent to which 
the pictures facilitated or interfered wit 
Speed of decoding. 

Identifying pictures. An ANOVA on the 
difference scores between reading times 
under the identifying-picture and no-picture 
conditions indicated that there was no main 
effect of reading ability (F « 1), but that 
there was a main effect on word difficulty, 
F(2,90) = 813, p < Q1. The interaction 
between reading ability and word difficulty 
p not significant, F(4, 90) = 1.51, P 7 


It can be noted from Table 3 that for the 
reading time measure, the size of the a 
dard deviations tended to be proportional 
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Figure 1, Mean difference in reading time between the 


no-picture condition (NP) and the identifying-picture 
condition (IP). 


the treatment means. This correlation also 
existed for the difference scores used in this 
analysis. Although the F test is known to be 
relatively little influenced by heterogeneity 
of variance when cell sizes are equal, the 
precaution was taken of performing a loga- 
rithmic transformation on the scores to 
normalize the distribution (as suggested by 
Myers, 1972). A reanalysis of the trans- 
formed data produced an identical pattern 
of results. There was no main effect of 
reading ability (F < 1); there was a main ef- 
fect of word difficulty, F(2, 90) = 6.43, p < 
01; and there was no significant Reading 
Ability x Word Difficulty interaction, F(4, 
90) = 1.79, p > .10. 

As shown in Figure 1, the word difficulty 
effect resulted from the fact that when the 
Words were easy, the identifying pictures 
interfered with performance; and when the 
words were more difficult, the identifying 
pictures facilitated performance. 

It can also be seen in Figure 1, that this 
effect appeared to be greater among the poor 
readers. 'To examine more closely the ap- 
parent relationship between reading ability 
and the influence of identifying pictures, a 
Pearson correlation was computed. For 
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each of the six different pages of words (two 
pages at each of the three levels of difficulty), 
a difference score was calculated between the 
reading times when no pictures (NP) were 
present and when the identifying pictures 
(IP) were present (NP — IP). Since any 
deviation from the no-picture baseline (ei- 
ther facilitation or interference) reflected an 
influence of the pictures, the absolute 
numbers were totaled for each child as an 
indication of their susceptibility to influence 
by identifying pictures. The zero-order 
correlation between this measure of sus- 
ceptibility to influence by pictures and the 
Gates-MacGinitie Reading Comprehension 
test scores was r(46) = —.54, p < .01. The 
partial correlation, controlling for nonverbal 
IQ (IPAT scores) was r(45) = —.52, p < .01. 
Thus, it is clear that the less skilled the 
reader, the greater was the influence on 
reading time of identifying pictures above 
the words. Moreover, this inverse correla- 
tion was not accounted for by general non- 
verbal intelligence. 

Unrelated pictures. An additional 
ANOVA on the difference scores between 
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Figure 2. Mean difference in reading time between the 
no-picture condition (NP) and the unrelated-picture 
condition (UP). 
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Figure 3. Frequency (f) distribution of picture-word interference scores (no-picture minus unre- 
lated-picture conditions [NP — UP]) for good, normal, and poor readers. 


reading times under the unrelated- and no- 
picture conditions showed that both reading 
ability, F(2, 45) = 4.07, p < .05, and word 
difficulty, F(2, 90) = 4.43, p < .01, produced 
significant main effects, and that there was 
no interaction between them (F < 1). To 
insure that the heterogeneity of variance 
that was evident in the difference scores here 
as well had not radically affected the out- 
come of the ANOVA, the analysis was re- 
done after a logarithmic transformation was 
performed on the scores. The ANOVA on 
the transformed scores resulted in essentially 
the same pattern of results. There were 
significant main effects of reading ability, 
F(2, 45) = 3.70, p < .05, and of word diffi- 
culty, F(2, 90) — 4.42, p « .01; and the 
Reading Ability X Word Difficulty interac- 
tion was not significant, F(4, 90) — 1.11,p > 
.10. 

As shown in Figure 2, unrelated pictures 
interfered with decoding, and the poor 
readers suffered substantially more inter- 
ference from the unrelated pictures than 
both good and normal readers. The amount 
of variance accounted for by reading ability 
was examined following the same procedure 
as just described for the identifying-picture 
situation. Absolute difference scores were 
calculated between the unrelated-picture 
(UP) and the no-picture decoding times (NP 
— UP) for each page of words. Again, 

Pearson correlations were computed to de- 


termine the degree to which the influence of 
unrelated pictures was a function of reading 
ability. "The zero-order correlation between 
the difference scores and Gates-MacGinitie 
Reading Comprehension scores was r(46) = 
—.62, p <.01. The partial correlation con- 
trolling for nonverbal IQ was r(45) = —.52. 
p € 0l. Clearly, the less skilled readers 
were more vulnerable to influence by the 
unrelated pictures, and nonverbal intelli- 
gence factors were not primarily responsible 
for this inverse correlation. 

The main effect of word difficulty (shown 
in Figure 2) resulted from the fact that there 
was less interference on the moderate than 
on the relatively easier and more difficult 
words. Without further research this effect 
is not easily interpretable. The finding may 
have something to do with the particular 
picture-word combinations used in the 
moderate condition. It cannot be inferre 
from these data that, in general, words ? 
moderate difficulty produce less interference 
than easy and difficult ones. In this expe" 
iment, word difficulty was clearly relative 
the reading ability of the child. By ona 
ining the mean reading times in Table 3, "n 
can be seen, for example, that in the nous 
ture baseline condition, the normal and po? 
readers found the moderate words somewh4 
more difficult than the good readers fou" 
the most difficult words. the 

Individual differences. In Figure 3» 
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amount of interference caused by the unre- 
lated pictures is plotted forthe subjects in 
the three reading ability groups. It is evi- 
dent from the graph that all of the good 
readers and all but two of the normal readers 
showed relatively little interference from the 
pictures. The poor readers, however, were 
more variable in the amount of interference 
they suffered in the presence of the unre- 
lated pictures. Half of the poor readers 
(eight children) were no more susceptible to 
interference than were the good and normal 
readers, but the other half of the poor read- 
ers were more interference-prone than all of 
the good readers and nearly all of the normal 
readers. 

In order to determine whether the same 
children tended to pay more attention to the 
pictures in both the identifying- and in the 
unrelated-picture conditions, a Pearson 
correlation was performed on the NP — IP 
and NP — UP absolute difference scores. 
The correlation coefficient of r(46) = .50, p 
< .01, indicates that there is some consis- 
tency across materials. Children whose 
performance is influenced by identifying 
pictures are also more susceptible to inter- 
ference by unrelated pictures. 


Decoding Errors 


There were very few decoding errors either 
with or without pictures. The mean num- 
bers of decoding errors (out of a possible 30) 
in the no-picture condition at the three dif- 
ficulty levels were respectively: for good 
readers, .31, .75, and .63; for normal readers, 
.63, 1.00, and .75; for poor readers, .94, 1.00, 
and 3.06. The difference in the numbers of 
errors children made in the no-picture and 
in the two picture-word conditions were 
examined to further investigate the facili- 
tative and interfering effects of pictures. 

Identifying pictures. When the decoding 
errors scores under the no-picture condition 
Were subtracted from the identifying-picture 
condition, a measure of the effect of the 
identifying pictures was obtained for each 
child. An ANOVA on these scores showed 
that there was a main effect of reading abil- 
ity, F(2, 45) = 4.40, p < .05, and an interac- 
tion between reading ability and word dif- 
ficulty, F(4, 90) = 7.93, p «.01. The iden- 
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tifying pictures did not influence good and 
normal readers’ decoding accuracy. Poor 
readers, however, made fewer decoding er- 
rors when there were identifying pictures 
above the most difficult words. It should be 
noted, however, that decoding accuracy 
could not improve very much for the better 
readers at any level of word difficulty be- 
cause it was almost perfect already. For the 
poorer readers also, there was less room for 
improvement at the easier levels of word 
difficulty. In view of this floor effect, any 
conclusions based on the decoding errors 
data are necessarily limited. 

Unrelated pictures. The difference 
scores between the no-picture condition and 
the unrelated-picture condition on decoding 
errors were also analyzed. The only signif- 
icant effect of that ANOVA was a Reading 
Ability X Word Difficulty interaction, F(4, 
90) = 3.61, p <.05. Although this interac- 
tion was statistically significant, the absolute 
size of the effect on decoding accuracy was 
very small. Moreover, the result may again 
simply reflect the floor effect on decoding 
errors. 


General Discussion 


The child's task under all conditions was 
simply to read a set of words aloud with 
speed and accuracy. A baseline for each 
child's performance was provided by reading 
time and errors scores on each set of words 
under the no-picture control condition. 
When pictures were placed on the page, if 
the child read either faster or slower than in 
the baseline condition, then that change was 
clearly attributable to the presence of the 
pictures. 


Difficulty of Decoding 


If poor readers are more susceptible than 
good readers to influence by pictures simply 
as a function of the relatively greater diffi- 
culty of the decoding task for them, then it 
would have been expected that the perfor- 
mance of poor readers would be less in- 
fluenced by pictures when they were de- 
coding words that were easier for them, and 
that more skilled readers might be more in- 
fluenced by pictures when they were de- 
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coding words that were relatively more dif- 
ficult. In terms of the theory of automatic- 
ity (LaBerge & Samuels, 1974), if the de- 
coding of the words could be performed 
without attention, then the processing of 
additional information (the pictures) could 
take place simultaneously with little or no 
performance decrement. The results of the 
study could not, however, be accounted for 
by an automaticity interpretation. The 
decoding performance of the poor readers 
was considerably influenced by the pictures 
at all levels of word difficulty. That of good 
and normal readers was more, not less (as the 
automaticity view would predict), affected 
by pictures at the easiest level of word diffi- 
culty. 
In view of the fact that the differences 
among reading ability groups in suscepti- 
bility to influence by pictures cannot be ac- 
counted for by an automaticity interpreta- 
tion, it seems most reasonable to attribute 
reading speed increases and decreases in the 
presence of pictures to diversion of attention 
from the decoding task. When the pictures 
on the page identified the words printed 
below them, then paying attention to the 
pictures might be expected to make the task 
easier if the child had some uncertainty 
about the words he or she was decoding (in- 
deed, the child could simply name the pic- 
tures), but might make performance less 
efficient if the words were already very fa- 
miliar. When the pictures were unrelated 
to the words printed below them, then it 
might be expected that processing the pic- 
tures would have a generally debilitating 
effect on efficiency of decoding at all levels 
of difficulty. This, in fact, was the pattern 
of results obtained in the present research. 


Individual Differences 


Under one condition or another, the good 
normal, and poor readers were all influenced 
in their decoding performances by the 
presence of pictures. The degree of this ef- 
fect was, however, clearly different de- 
pending upon the child's level of reading 
skill. Good and normal readers were less 
susceptible to influence by pictures than 

were poor readers. Moreover, the patterns 
of performance of good and normal readers 
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were quite similar to each other in the vari- 
ous conditionssánd differed from those of 
poor readers. e poor readers paid more 
attention to the pictures in all conditions. 
In the identifying-picture condition, poor 
readers were debilitated in their reading 
performance when they were decoding easy 
words and were facilitated while reading the 
more difficult words. In the unrelated- 
picture condition, the pictures interfered 


with the poor readers’ decoding at all levels | 


of word difficulty. 

Reading problems undoubtedly have 
many and varied origins. On the basis of the 
results of this experiment, it is clear that a 
significant proportion of children with 
reading difficulties are less able to resist 
being influenced by adjunct pictures while 
reading than are more proficient readers. 
Since the pictures in their primers and other 
school books are rarely placed above the 
words they identify, children who have a 
tendency to refer to nearby pictures when 
they are unsure of words will almost invari- 
ably be misled. 

Thus, although the adjunct pictures in 
children’s early readers are surely intended 
as an aid to reading, it is clear from the 
present research that for some individuals, 
such pictures may seriously interfere with 
reading performance. Indeed, children who 
are especially susceptible to visual distrac- 
tion in a reading situation may actually be 
“reading failures” as a result of the presence 
of adjunct pictures in their primers. Given 
this possibility, perhaps, on entering school, 
children should be screened for susceptibil- 
ity to distraction by pictures and their 
primers should be assigned accordingly: 
However, before any drastic measures arè 
taken, other picture research employiné 
reading tasks that more closely resemble 


“normal” reading should certainly be UI 
dertaken. : d 
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A battery of problem-solving and divergent-thinking tasks was administered 
to 91 fifth graders from a middle-class background. The purpose of the re- 
search was to attempt identification of a factor that could be labeled problem 
solving and be distinct from divergent-thinking factors. In exploratory factor 
analyses, principal axis, alpha, and maximum-likelihood factor procedures 
with orthogonal and oblique rotations were computed. The three-factor solu- 
tions across all factor methods and rotations were quite consistent. In addi- 
tion to a major ideational fluency factor and a small school achievement factor, 
a factor consisting of tasks requiring subjects to analyze given problem condi- 
tions and sequence steps to achieve a stated goal was described. This factor 
accounted for between 20% and 30% of the total variance and was labeled a 
problem-solving factor. Results were discussed in terms of possible psycho- 


logical processes underlying divergent thinking and problem solving. 


The psychology of human problem solv- 
ing and creative and divergent thinking has 
been an area of great activity and variety 
since J. P. Guilford’s address almost 30 years 
ago (Guilford, 1950). While there has been 
a focus on a specific number of creative 
thinking skills, such as fluency, flexibility, 
originality, elaboration, novelty, and clev- 
erness (Guilford, 1967; Wallach, 1970), there 
has been no similar focus in problem solving. 
Guilford (1958, 1967, 1971) has not been 
optimistic about discovering a single prob- 
lem-solving Skill or factor and, in fact, con- 
siders the potential skills involved in prob- 
lem solving as numerous as the problems 
humans may face. À 

Duncan (1959) reviewed the problem- 
solving literature and concluded that little 
progress would be made in such a diverse 
field until some sort of theory development 
accompanied experimentation. Too many 

different kinds of tasks had been used to 
measure problem solving, and leaders of re- 
search efforts had failed to generate an 
agreed-upon taxonomy of the cognitive 
processes or skills involved. Reviews of the 
field in succeeding years have attempted 
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classification systems (Davis, 1966, 1973; 
Feldhusen, Houtz, Ringenbach, & Lash, 
Note 1), but they have not suggested 
whether certain tasks or measurement pro- 
cedures could move the field closer to ade- 
quate theory development. 

Speedie, Treffinger, and Houtz (1976) 
have proceeded one step further by classi- 
fying a wide range of problem-solving an 
divergent production tasks in an effort to 
identify the psychological processes in- 
volved. The criteria employed for the clas 
sification include the following: problem 
complexity, amount of previous learning 
required, motivational level, scores yieldet, 
number of constraints on the type of prob- 
lem-solving behavior involved, reliability, 
and the possibility of group administration. 
The authors concluded that the two types? 
problems satisfying the majority of the c 
teria were verbal maze problems and written 
simulation exercises (Speedie et al., 1976): 
Mazes and simulations can be made suffi- 
ciently complex to be meaningful and no”: 
trivial for students, thus going beyo” 
Johnson’s (1972) definition of a problem 2 
"any situation in which the subject's initi 
attempts to reach a goal are blocked (p 
134). All of the information necessary 
solve the problems can be incorporated i? 
the structure of the tasks. They can als? d 
constructed using story and real-life the?" 
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a to make them highly interesting to chil- 
dren. 

The purpose of the present research is to 
describe, via exploratory factor analyses, 
processes likely to be involved in the solution 
of these types of problems as distinct from 
other tasks. The objective is to administer 
a wide variety of well-known problem-solv- 
ing and divergent-thinking tasks, including 
mazes and simulations, and attempt to 

... identify a factor underlying these two tasks 
B” that is distinct from previously identified 
divergent-thinking factors. 


Method 


Subjects 


The subjects were 91 white, fifth-grade boys and girls 
from six intact classrooms in a medium-sized suburban 
elementary school in northern New Jersey. There were 
48 girls and 43 boys. All of the subjects were from a 
middle- to upper-middle-class background. Although 
standardized intelligence and achievement information 
was not available, teachers’ ratings of students’ ability 
were obtained. The teachers’ ratings appeared nor- 
mally distributed with a mean rating of “average” (see 
the Procedure section). Nevertheless, the children in 
the present study most likely exhibited above-average 
intellectual ability and achievement when compared to 
the general population. 


Measures 


The materials for the present study involved a wide 
Tange of experimenter-made instruments. The selec- 
tion of each type of task, however, was based on a ten- 
tative model of problem solving as information pro- 
cessing (Johnson, 1972; Miller, Galanter, & Pribram, 
1960; Newell & Simon, 1972); that is, the subject’s main 
task in solving problems is one of generating or acquir- 
ing information, testing that information in terms of the 
stated or implicit criteria for problem solution, and then 
reapplying that information in a new hypothesis or step 
toward the goal. The following measures and tasks 
were included for the present study: 

Achievement rating.— Each teacher was asked to rate 
each of his or her students in terms of general school 
y* achievement on a 7-point scale. Each of the 7 categories 

Teferred to the degree of success with regular school 
| Subjects: 7 = superior almost all the time, 6 = above 
average in general, 5 = average to above average most 
of the time, 4 = average most of the time, 3 = below 
average to average most of the time, 2 = below average 
Most of the time, and 1 = little success in any area. 

Making words from antelopes.—In 4 minutes, 
Subjects were to generate as many English words as 
Possible, excluding proper names and abbreviations, 
using letters in the word antelopes. This task is a 
measure of one type of fluency, word fluency (Speedie 


849 


et al., 1976; Wallach, 1970), which is a divergent- 
thinking measure. 

Running short of things and uses for plastic 
trays.—In these tasks, subjects were asked to think of 
as many ways to conserve energy and natural resources 
in school or to reuse or use for other purposes small, 
clear plastic trays originally used by many supermarkets 
to hold meat cuts (5 minutes each). For both of these 
tasks, in addition to a count of the number of ideas 
generated (ideational fluency), all ideas for all subjects 
were categorized, and a score of the number of different 
categories represented in subjects’ answers was ob- 
tained. This score represented a variable called 
spontaneous flexibility, another possible divergent- 
thinking measure (Wallach, 1970). 

Making boxes.—Subjects were asked to cross out 
lines in a complex pattern of boxes to reduce the number 
of boxes present, without any extraneous lines rem- 
aining. In 5 minutes, the children were to do as many 
of these tasks as possible. This task, too, has a con- 
siderable history (Katona, 1940; Guilford, 1967). Itis 
supposed to represent another production ability, that 
of adaptive flexibility (Wallach, 1970). 

Making change.—In 5 minutes, subjects were asked 
to supply the correct number of coins and types of coins 
required to make a predetermined amount of money. 
There were seven arithmetic problems of this type to do 
in the time allowed (Speedie et al., 1976). 

Buying animals.—In a second arithmetic-type 
problem, subjects were asked to spend an exact amount 
of money, no more or less, to buy different types of an- 
imals, each with different prices. The children were to 
find as many acceptable combinations of purchases as 
possible in 4 minutes (Speedie et al., 1976). 

The new bike.—This was the written simulation 
problem-solving exercise. Children were presented 
with a deliberately ambiguous situation in which they 
had the opportunity to earn enough money in a 2-week 
time period to buy an attractive bicycle on sale. In 
order to solve the problem, subjects sought information 
in different forms and of different values. For example, 
the children could earn money by working for a variety 
of neighbors. However, not all the jobs paid the same 
or took the same amount of time. Therefore, the 
subject had to acquire a certain amount of information, 
evaluate it in terms of the goal or some plan of his or her 
future behavior, and then act. More specifically, to be 
able to buy the bike, each subject had to earn $30.00 
within a 14-day period. The score on this task was “1” 
for solution or “0” for no solution. The time involved 
was approximately 20 minutes (Speedie, 1974). 

Verbal maze.—Subjects were presented with a list 
of "spy-name" pairs. The pairings represented “spies” 
able to communicate with one another. The subjects 
were then asked to select pairs of spy names to make a 
chain to pass a message. The problem is that not every 
spy can speak to another spy, and there are several 
possible paths available, varying in length, to pass the 
message. The score obtained was the number of correct 
chains or paths identified (Hayes, 1965). In the present 
study, there were two separate verbal maze tasks, each 
lasting approximately 5 minutes. The verbal mazes and 
new bike tasks were the two measures of hypothesized 
problem-solving ability in the present study. 

Purdue Elementary Problem Solving Inventory.— 
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Five tasks from the Purdue inventory (Feldhusen, 
Houtz, & Ringenbach, 1972) were included. The tasks 
presented subjects with a cartoon-like drawing of chil- 
dren in various problem situations. In response to each 
of the scenes, children were asked to (a) define the 
problem, (b) ask questions to gain more information 
about the situation, (c) guess causes of the problem, (d) 
predict consequences of certain actions, and (e) generate 
possible solutions to the problem. In the regular ad- 
ministration of the inventory, subjects are presented 
with three alternatives to each question about a scene. 
In the present study, however, the response format of 
the Purdue inventory items was altered. Subjects were 
not given alternatives from which to choose, but were 
asked to.generate as many ideas in response to the 
problem as possible (3 minutes each). With the fluency 
format, it was expected that the Purdue inventory items 
would yield measures of divergent thinking rather than 
problem solving. 
Logic.—Subjects were asked to solve as many of 16 
nonsense syllogisms as possible in 5 minutes. The 
children needed only to respond “true” or “false” to the 
conclusion. Guilford (1967) has used this type of task 
to measure one evaluation ability (evaluation of se- 
mantic implications). It was included in the present 
study because of the hypothesized “evaluative” nature 
of the problem-solving process (Johnson, 1972; Newell 
& Simon, 1972). 

Memory for occupations.—Subjects were asked to 
memorize sets of name and occupation pairs for 2 
minutes, Then, they were asked to select from among 
four alternatives the name of an object related to the 
occupation of a particular person’s name. Guilford 
(1967) also made use of this type of task to assess a 
memory ability (memory for semantic implications). 
Evidence of the importance of memory abilities in 
problem solving has been presented by numerous 


bat (Greeno, 1973; Guilford, 1967; Newell & Simon, 


Procedure 


Each task was administered b; i i 

s y a trained experi- 

ment to DEUM The times of day um 
rom class to control for possible fati, 

Several tasks were administered t Eod dis 


e ed to a group at each 
session, Each classroom was tested once per week and 
sometimes once every 2 weeks over a 6-month period. 


The scoring of all tasks was based on either the 
number of correct solutions or the number of ideas 
generated. Flexibility scores for both the running- 
short-of-things and plastic-trays problems were based 
on the actual responses made by the children. All re- 
sponses were tabulated, frequencies of each Tesponse 
noted, and categories of responses formed. For flexi- 
bility scores, the number of different categories men- 
tioned by subjects was recorded (specific category labels 
may be obtained from the authors), 


Results 


Interjudge reliability estimates were ob- 
tained for all of the open-ended, idea gen- 
eration tasks. Samples of 20 papers were 
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| 
graded by two experimenters. For all such 
tasks, the interrater reliabilities ranged from | 
.83 to .98. 

In order to prevent the predetermination 
of factors based on sets of measures reason- 
ably well correlated, summed variables for 
ideational fluency, spontaneous flexibility, 
Purdue inventory problem-solving fluency, 
and the verbal maze were used in place of 
individual tasks. Correlations between the 
two ideational fluency tasks (.49), between: 
the two spontaneous flexibility scores (.48);: 
among the five Purdue inventory problem: 
solving fluency tasks (median r = .55; range 
was .30 to .60), and between the two verbal: 
maze (.66) problems were significant (p € 
.01). Therefore, an ideational fluency score 
was generated by adding together the num- 
ber of ideas on both the running-short-of- 
things and plastic-trays problems. A similar: 
procedure was used to obtain spontaneous. 
flexibility, problem-solving fluency, and. 
verbal maze total. l 

Table 1 presents the intercorrelations | 
among all tasks employed in the present 
study. In general, the ideational fluency; 
spontaneous flexibility, and Purdue inven- 
tory fluency total were substantially related. 
Correlations greater than or equal to .30 were 
also obtained between the antelopes total, 
making-change total, verbal maze total, the 
Purdue inventory problem-solving total, and 
a number of other measures. 


Factor Analyses 


Merrifield (1974) has suggested that the 
application of factor analytic techniques 18 
more of an art form than a science. In case 
where exploratory hypotheses are to be 
tested, it has been recommended that a V& 
riety of analyses be computed; the intel 
pretation of results should then be based 0? 
those factor patterns that appear relatively 
Stable across the different methods at! 
rotations (Harris & Harris, 1971). There 
fore, we examined principal axis (Harman 
1967), alpha (Kaiser & Caffrey, 1965), a" 
maximum-likelihood (Lawley, 1943) so, 
tions, with numbers of factors from 2-10 21 | 
both orthogonal (varimax) and oblique (à 


ARD factor rotations (Harma™ 
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Table 1 
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Intercorrelations Among Divergent-Thinking and Problem-Solving Measures 


Measure ME dE rb, eU 8 9 T02 7 ETIST 2) 
1. Achievement ratings 41317441 /32«4126/. 21:01. -—17. 11 03 32 09 
2. Antelopes 46523) 128: 1] 20 17. 29 36 35 09 
3. Making change 38 .26 19 15 01 30 36 35 13 
4. Buying animals ond 31 TO 05 32 21. 540)" 34 
5. Making boxes 12 18 17 29 14 48 28 
6. Logic 231215—12.4/18:5:—02:1 32,015 
7. Memory 04 11 14 233 08 
8. Ideational fluency 64 [20S a Os) 
9. Spontaneous flexibility 55 33 22 
10. Purdue inventory problem solving Di aL 
11. Verbal maze 30 
12. New bike 
Note. Decimal points have been omitted. 


In all, 30 analyses were computed, and the 
results were quite consistent. 'The three- 
factor solutions across all factor methods and 
rotations provided very similar and clear 
separations of the divergent-thinking and 
problem-solving measures. Tables 2 and 3 
present the factor loading matrices for the 
three-factor solutions for the three factor 
methods and two rotational methods. As 


Table 2 


can be seen, Factor 1 appears in all but one 
case to be a general fluency factor, with 
substantial loadings from the ideational 
fluency, spontaneous flexibility, and Purdue 
inventory problem-solving fluency measures. 
Factor 2 appears to represent a different 


! Further information regarding the analyses can be 
obtained from the authors. 


Factor Loading Matrices for Three-Factor Solutions of Principal Axis, Alpha, and Maximum- 


Likelihood Methods with Varimax Rotations 


Principal axis Alpha Maximum likelihood 
Factor Factor Factor Factor Fats Factor Factor Factor Factor 
Measure 1 2 3 1 2 3 1 2 3 

Achievement ratings 26 58 64 29 52 

Antelopes 62 61 26 54 

Making change 25 67 26 65 76 

Buying animals 55 27 51 27 51 32 

Making boxes 48 4T 53 

Logic 45 53 45 

Memory 27 

Ideational fluency 94 95 96 

Spontaneous flexibility 66 34 63 33 68 32 

Purdue inventory prob- 71 28 72 25 69 31 
lem solving 

Verbal maze 65 30 67 29 66 28 

New bike 47 40 44 
Eigenvalues 3.14 1.53 .59 7.97 2.46 1.57 2.04 1.71 1.53 
% variance 59.7 « 294 ASA 66:4109:20.55:0:18.1,, 386 824. 290 

Note, 


Decimals and loadings below .25 have been omitted. 
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Factor Pattern Matrices for Three-Factor Solutions of Principal Axis, Alpha, and Maximum- 


Likelihood Methods with Oblique Rotations 


Principal axis - 


Alpha Maximum likelihood 


Factor Factor Factor Factor Factor Factor Factor Factor Factor 


Measure 1 2 3 1 2 3 2 1 : 
66 4 
Achievement ratings * P. E 
Antelopes pe d aa 
Making change = i t 
Buying animals 53 E a 
Making boxes 47 He 
Logic 4T s 
Memory S E 
Ideational fluency —96 K s 
Spontaneous flexibility 30 —62 26 59 s a 
Purdue inventory problem solving -70 —28 71 i: 
Verbal maze 64 66 ^ 
New bike 52 43 
Eigenvalues 3.14 1.53 .59 7.97 2.46 1.57 2.0 1.71 1.53 
% variance 59.7 29.1 113 664 205 131 386 324 290 
Note. Decimals and loadings below .25 have been omitted. 


type of underlying ability, with loadings 
from the verbal maze, new-bike simulation 
problem, buying-animals (arithmetic prob- 
lem solving), making-boxes (adaptive flexi- 
bility), and logic (evaluation of semantic 
implications) tasks. In addition, a secon- 
dary loading of flexibility appeared on this 
factor. The percentage of variance ac- 
counted for by these two factors was also 
consistent across analyses. The first factor 
accounted for approximately 60-65% of the 
total variability of the three-factor solution 
in the principal axis and alpha solutions, 
while the second factor accounted for ap- 
proximately 20-30% of the variance. 

The third factor in all analyses appeared 
to represent a school achievement factor, 
with loadings from making change (arith- 
metic skills), antelopes (word fluency), and 
achievement ratings. This factor accounted 
for approximately 10% of the total variance 


of the principal axis and alpha three-factor 
solutions. 


Discussion 


One factor that emerged was consistently 
representative of ideational fluency. 
Spontaneous flexibility and the Purdue in- 


ventory items, whose format was altered to 
request from students many ideas, not sur- 
prisingly also represented this factor. 
Wallach (1970) has emphasized the impor- 
tance of fluency as the single best creative 
thinking measure. The present study; 
therefore, does tend to support the clearly 
identifiable nature of ideational fluency. In 
addition, this type of fluency appeared dis- 
tinct from possible measures of school of 
intellectual ability. Word fluency, for eX: 
ample, loaded with the third factor and 
teacher ratings of achievement rather than 
with ideational fluency. This may offer 
further support for Wallach's view that 
ideational fluency can provide an estimate 
of creative thinking ability separate from 
general intellectual performance. i 
Speedie et al. (1976) have indicated tha 

for a variety of reasons, written simulat" 
problems and the verbal maze or spy ae 
lem are the most promising tasks for t 
study of the problem-solving process. <, f 
purpose of the present study was to identi 
processes involved in the solution of si a 
lations and verbal maze problems that Fi 
distinct from other traditional measures 
problem solving and divergent thinking. ^ 
second factor identified in the analyses 4? 
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pears to indicate that the same processes are 
involved in the simulation and maze tasks as 
are also involved in making boxes (adaptive 
flexibility) and buying animals (mathemat- 
ical rearrangement). The common element 
in all of these tasks appears to be the clear 
specification of problem conditions and a 
terminal goal or objective. Rules are pro- 
vided in the statement of the problem that 
guide the solution or search process of the 
subject. In the simulation, various neigh- 
bors offer jobs paying different amounts of 
money but taking different amounts of time 
to complete. In the verbal maze, some spy 
pairs can talk to each other while other spies 
cannot. In making boxes, a certain number 
of lines must be eliminated to leave a certain 
number of boxes remaining. In buying an- 
imals, the animals have specific prices, and 
a certain amount of money must be spent 
exactly. Thus, one might characterize 
Factor 2 as a problem-solving factor involv- 
ing the ability to organize and evaluate var- 
ious problem elements and conditions in the 
proper order to reach a solution. 

The appropriateness of such an interpre- 
tation may be examined from the point of 
view of the individual problem solver as a 
processor of information (Miller et al., 1960; 
Newell & Simon, 1972). To solve a partic- 
ular problem, each of the initial conditions 
existing in the problem statement or goal 
statement must be analyzed and interpreted. 
These conditions limit the types of hypoth- 
eses that may be generated and tested and 
also provide the means by which tentative 
solutions or steps toward the final goal can 
be evaluated. The actual solution sequence, 
then, may be one of the generation of new 
information from given information and 
comparison of progress made toward a final 
goal or set of conditions. To perform this 
activity effectively, one might find highly 
' desirable the ability to rearrange problem 
elements, such as lines and boxes, amounts 
of money, and sizes of coins or prices of ani- 
mals (adaptive flexibility), as in the prob- 
lem-solving tasks. These skills appear to be 
the same as the arrangement of name pairs 
in the proper sequence to pass a message or 
neighbors' jobs to earn a specified amount of 
money within a given time limit. 

It thus appears that a problem-solving 
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factor may be distinguished as one involving 
the ability to sequentially identify and 
evaluate information (problem elements) in 
terms of given rules and conditions (problem 
statements) in order to reach a goal. In the 
case of the divergent-thinking fluency 
measures, while a problem statement does 
exist in each case, rules by which hypotheses 
may be generated and criteria by which hy- 
potheses may be tested are typically not in- 
cluded. "Therefore, a large number and va- 
riety of ideas may be generated without the 
necessity of either the identification of new 
information or the evaluation of ideas in 
terms of the original problem statement. 

In addition, the tasks loading on Factor 2 
do not appear to depend heavily on memory 
skills, for the memory test did not load sub- 
stantially on any factor. The logic test (an 
evaluation measure) did load on the prob- 
lem-solving factor, thus providing additional 
support for the importance of evaluative 
reasoning in the problem-solving process 
(Guilford, 1958, 1967, 1971; Johnson, 1972). 
Finally, in several analyses, the total flexi- 
bility score had secondary loadings on the 
problem-solving factor. This result sup- 
ports other findings (Houtz, Montgomery, 
Kirkpatrick, & Feldhusen, in press) and may 
suggest that the ability to “switch” categories 
of responses may aid in the evaluation as- 
pects of problem solving, perhaps by allow- 
ing subjects to more quickly identify and 
avoid “dead ends" or inefficient strategies. 

The third factor most likely reflects gen- 
eral school achievement. Of all of the tasks 
used in the present project, the word fluency 
task was the one most familiar to the chil- 
dren. In fact, teachers had made use of this 
type of task regularly as part of spelling and 
vocabulary exercises. Other tasks loading 
on this factor also involved mathematical 
and/or verbal components one might suspect 
would be related to student academic 
achievement and, consequently, teacher 
ratings. 


Summary 


Guilford (1958, 1967, 1971) has pointed 
out that studies have not isolated a single 
problem-solving factor. Rather, with every 
different problem-solving task employed in 
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research, there could be a different set of 
skills involved. The present research has, 
however, suggested a factor distinct from 
traditional divergent-thinking and school 
achievement measures that may characterize 
several types of tasks particularly useful for 
future investigations of human problem 

Solving. Limitations of the present research, 
however, such as the grade level, ability 
range, and size of the sample employed 
suggest further research. In addition, the 
types of problem-solving and creative 
thinking tasks employable for this type of 
analysis have by no means been exhausted. 
Additional types of tasks may reveal, for 
instance, a further breakdown of hypothesis 
generation and hypothesis testing skills. 
Finally, the study of human problem solving 
needs additional theoretical attention, so 
that factor analytic procedures, as used in 
the present study, may be used in the future 
more often to test hypotheses rather than, 
as was the case here, to generate them. 
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Leniency, Learning, and Evaluations 
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Previous studies have yielded mixed results concerning the relationships be- 
tween the amount students learn, instructor leniency, and student evaluations 
of instructors. One reason for the conflicting results is that grades and learn- 
ing are highly correlated, and the effects of leniency and learning have not yet 
been separated precisely. We present a three-equation model that disenta- 
ngles learning from leniency and relates the results to evaluations. The first 
equation measures the impact of each instructor on the amount of knowledge 
students have acquired. The second measures the leniency of the instructor. 
The third relates these two measures to student evaluations of instructors. 
For our sample, there was no significant relationship between either leniency 


or learning and evaluations. 


With student evaluations of instructor 
effectiveness playing an increasingly im- 
portant role in the determination of merit 
pay, promotion, and tenure, there is a 
growing interest in what these evaluations 
actually measure. Faculty members fre- 
quency voice doubts about using student 
evaluations because it is not clear to what 
extent they measure the leniency of the in- 
structors, the amount the instructors taught 
the students, or some other characteristic of 
the instructors. 

Several recent studies have documented 
a positive relationship between the grades 
economics students receive and the evalua- 
tions they give their instructors (Capozza, 
1978; Kelley, 1972). Similar results have 
been reported for other disciplines (Murray, 
Note 1) and across various disciplines 
(Feldman, 1976; Nichols & Soper, 1972; 
Perry & Baumann, 1973; Reuber, Note 2). 
These results are consistent with the view 
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that instructors *buy" high evaluations (and, 
they hope, higher pay, promotion, and ten- 
ure) by “giving” their students higher grades. 
The results are also consistent, though, with 
several other behavioral models. Students 
with higher grades may have given higher 
evaluations to their instructors because the 
instructors in these samples taught the 
brighter students. In fact, the brighter 
students may have sought out the better 
instructors. Alternatively, it is possible that 
a positive correlation between grades and 
evaluations could be observed if the better 
instructors, who justifiably received higher 
evaluations, taught their students more, so 
that their students justifiably earned higher 
grades. Finally, the causation may be in the 
opposite direction from that usually as- 
sumed, and “an instructor might grade a 
class harshly or generously because of the 
ratings he receives (or anticipates)” (Doyle, 
Note 3, p. 9). 

Many other studies have found no rela- 
tionship between grades and evaluations. 
These studies are well summarized by Cos- 
tin, Greenough, and Menges (1971) and 
Menges (1973). But as McKenzie and Tul- 
lock (1975) point out, the lack of a correla- 
tion between grades and evaluations does 
not necessarily lead to a rejection of the hy- 
pothesis that more lenient instructors re- 
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ceive higher evaluations. Ifinstructors be- 
come more lenient in the hopes of receiving 
higher evaluations, the students may re- 
spond simply by studying less and learning 
less, yet receiving no lower grades. In this 
case, there could be a strong and positive 
correlation between evaluations and leniency 
but not between evaluations and grades. In 
general, the use of grades, uncorrected for 
the knowledge obtained by the students, as 
a measure of instructor leniency may be 
quite misleading. 

Attempts to measure the relationship 

between learning and student evaluations of 
instructor effectiveness have yielded mixed 
results. Capozza (1973) reported a negative 
and significant relationship between evalu- 
ations and the amount students learned, but 
he has since then indicated to us by corre- 
spondence that with a larger sample, his re- 
sults are no longer statistically significant. 
Besides using grades as a measure of le- 
niency, which we have already suggested 
may be inappropriate, Capozza also failed to 
include any variables in his model to explain 
why some students might learn more than 
others. Rodin and Rodin (1972) also found 
a significantly negative relationship between 
evaluations and the amount learned, but 
their study has been found lacking in several 
respects (see Eble, 1974; Frey, 1973), in- 
cluding small sample size and omitted vari- 
ables. 

b Crowley and Wilton (1974) found a posi- 
tive but insignificant relationship between 
some components of evaluations and the 
amount students learned in beginning eco- 
nomics. Significantly positive relationships 
have been reported by Gessner (1973), Frey 
(1973), Doyle and Whitely (1974), and Sul- 
livan and Skanes (1974). 

It appears from previous studies and from 
the criticisms leveled at them that the issues 
have been clouded by rhetoric and by the 
complexity of the relationships. What is 
needed is a model that accurately measures 
first, the impact of instructors on the amount 

students learn, correcting for other possible 
influences on learning. Second, the model 
must measure the leniency of the instructor 
correcting for other influences (including the 
amount learned) on students’ grades. And 
third, it must relate these me: t 


1 ; i asures to stu- 
dents’ evaluations of instructor effoctiyenom 
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correcting for other possible influences, 
What is needed, then, is a sequential three- 
equation model to determine the effects of 
learning and leniency on evaluations. We 
now turn to our development of such a 
model. 


Specification of Model 


Important Variables 


The knowledge of economic concepts 
(KNOW) gained by a student in the micro- 
economic portion of a beginning economics 
course depends on many things, most of - 
which are quantifiable. A list of these fac- 
tors includes the following: 

1. Previous knowledge of economics 
concepts (PRE). Students knowing more 
economics at the beginning of a course may 
well know more than others at the end of the 
course, though they may not learn as much 
new material during the course. 

2. Amount of previous economics (PE). 
If the student has had an economics course 
previous to this one, we would expect that 
the student might know more at the con- 
clusion of the course. 

3. Amount of calculus taken by the stu- 
dent (CALC) Because much of microeco- 
nomic theory explicitly or implicitly deals 
with differentiation and integration, stu- 
dents with a calculus background may learn. - 
the concepts more easily than students wi- | 
thout a calculus background. The amount 
of calculus in a student’s background may 
also be a proxy for analytical and mathe- 
matical aptitude. (The latter was found by 
Crowley and Wilton, 1974, to have a signifi- 
cantly positive effect on the amount of eco- 
nomics learned by students in beginning 
courses.) 

4. Previous academic average (AA): 
Students who have done well in the past in 
terms of their grades tend to continue to do. 
well, either because of high aptitude or be- 
cause of high motivation. Ability to take 
tests is a skill in itself; high academic average 
15, In part, a reflection of this ability. Be- 
cause academic averages in secondary 
schools are probably not commensurate wi 
academic averages for upper-class student 
at universities, we have split AA into two” 
parts: AAF, which represents previous (hi 
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school) academic average of first-year stu- 
dents, and AAU, which represents the pre- 
vious (university) academic average of 
upper-class students. 

5. Academic year of the student (X). If 
upper-class students are more mature than 
first-year students, they may learn more in 
a course. It is also possible, however, as 
suggested by Crowley and Wilton, that 
upper-class students view a beginning eco- 
nomics course as one which deserves less of 
their attention and effort, so that they learn 
less. Also, students who postpone taking 
their first economics course until their sec- 
ond or third year in the university may have 
less aptitude for it than first-year students. 
Clearly, a great deal of these effects will be 
picked up in our segmentation of previous 
academic averages by first year versus upper 
class. In the actual estimation pr ure, 
we further distinguish between second- and 
third-year students by including a dummy 
variable for third-year students. 

6. Time the class meets (T). Students 
might learn more in classes meeting at cer- 
tain times of day than they would from 
classes meeting at other times of the day. 

7. Size of the class (SZ). We include this 
variable to see if class size affects learning. 

8. Sex of the student (FEM). Crowley 
and Wilton (1974) found that female stu- 
dents learn significantly less in a beginning 
economics course than male students do. 
Their measure of amount learned, however, 
was biased against students beginning the 
course with less knowledge of economic 
concepts; so that if females began a course 
knowing less economics and improved their 
knowledge by the same absolute amount as 
males did, then the Crowley and Wilton 
measure of amount learned would yield a 
spurious result. We are including a dummy 
variable for females to determine whether 
the knowledge a student has of economic 
concepts in terms of absolute raw scores 
varies with the sex of the student, other 
things being equal. 


Method 


Subjects and Procedures 


d —_ in 14 sections of the microeconomics portion 
e Principles of Economics course at the University 
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of Western Ontario were given a 19-question multiple- 
choice examination at the beginning of their first class 
in September 1974.! This examination is the pretest. 
The examination was administered by persons not 
teaching the course, and instructors of the course were 
not permitted to see the questions on the examination. 
Examination questions were designed to test students’ 
mastery of economic concepts rather than of economic 
jargon. This pretest serves as our measure of PRE, a 
student’s previous knowlege of economics. The same 
examination was given to these students under exami- 
nation conditions at the end of the term in December. 
This posttest is used as our measure of KNOW, a stu- 
dent’s current knowledge of economics.” 

After the posttest was administered, we asked the 
instructors to indicate the degree of correspondence 
between the material covered by the posttest and ma- 
terial covered in class. ‘This correspondence was found 
to be uniformly high for all sections (perhaps due to the 
use of a common text and reading list), and so we are 
fairly confident that our test measures areas of knowl- 
edge covered in all sections in the sample. 


Knowledge Equation 


In order to gauge an instructor's contribution to a 
student's knowledge of economics, we estimated the 
following equation: 


KNOW;; = ao + a1PREj; + a2PEjj + &3CALCij 
+ aJ4AAFij + a5AAUij + agYij t a7T; + agSZj 
+ agFEMi; + @19,INST; + UKNOW- (1) 


With the exception of u and INST, the variables in this 
equation have been defined above: u is the residual 
error term; and INST is a set of dummy variables, one 
each for all but one instructor? The set of estimated 
coefficients à; thus gives us an estimate of the con- 
tribution of each instructor to students' knowledge 
relative to the contribution of the omitted teacher. A 
high value of â1o will be associated with an instructor 


1 The Principles of Economics course is taught at the 
University of Western Ontario in many sections, with 
an average enrollment of about 60 students per section. 
"The examination we used was a slightly modified ver- 
sion of the microeconomics portion of a test designed 
by Crowley and Wilton (1974), similar in nature to the 
American Test of Understanding College Economics 
but better suited for testing Canadian students. We 
eliminated some questions found to be fairly weak in- 
dicators of student knowledge by Crowley and Wilton 
and added a few questions to cover omitted material we 
felt ought to be included. A copy of the exam is avail- 
able from the authors. 

2 Students who dropped the course were omitted from 
the sample, as were those who managed to change sec- 
tions. Students are randomly assigned to sections at 
the University of Western Ontario and find it extremely 
difficult to change sections because of registration 
policies. Consequently, our final sample included 617 
students. 

3 See Beals (1972) for a description of dummy vari- 
able regression techniques. 
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whose contribution to student knowledge is relatively 
great, while an instructor with a relatively small con- 
tribution will have a low ao. 


Leniency Equation 


In order to determine the extent of an instructor’s 
leniency in assigning grades to students, we must control 
for variables other than leniency that may affect each 
student’s grade. Aside from instructor leniency, the 
grade* a student receives (GRADE) will depend on 
Variables 2 through 8, defined above, as well as on the 
amount that the student knows, which we measure by 
KNOW. Consequently, we estimated the following 
equation: 


GRADE, = Bo + ByPEy  BCALCj; + BsAAFij 
+ B4AAU;j + B5Yij + BeT; + B7SZ; + BaFEMij 
+ BgKNOWj;  B1oINST; + UGRADE- (2) 


In this equation, the set of estimated coefficients Bio 
play a role analogous to that of ao, in Equation 1. 
Here, the coefficients of INST provide a measure of the 
leniency of each instructor relative to the leniency of the 
omitted teacher. High values of 819 will be associated 
with relatively more lenient instructors.5 


Evaluation Equation 


We can use the estimated coefficients ĉi and Bio 
from Equations 1 and 2 to explore the relative impor- 
tance of the instructor's teaching ability and the average 
leniency of an instructor in determining the student 
evaluation of that instructor. The evaluation ques- 
tionnaire included the following “overall effectiveness” 
question: "How would you rate your instructor in 
terms of general, overall effectiveness as a teacher?” 
Students were asked to give their ratings on an integer 
scale ranging from 5 (“outstanding”) to 1 ("poor"). 

It would perhaps be desirable, for the purposes of our 

experiment, to identify each student's evaluation of his 
instructor with the student's own knowledge and grade. 
This was not possible because the evaluations were done 
anonymously? As a result, we used section averages 
for our regressions involving student evaluations of the 
instructors. These section averages are denoted by Ej. 
Our third equation is the following: d 


Ej = Yo + vidio, + yÊioj + ug, (3) 


where j = 1 through 14. 

The independent variables in this equation are the 
estimated coefficients on contribution to learning (from 
Equation 1) and instructor leniency (from Equation 2). 
It should be noted that in this study, we are not at- 
tempting to explain all the factors that go into the de- 
termination of student evaluations of instructors. Our 
aim is more modest: The estimates of Equation 3 will 
indicate only whether or not the amount taught to 
students by an instructor and the instructor’s leniency 
in assigning grades have a statistically significant in- 
fluence on student ratings of instructors, 
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Results and Discussion 


The model described in the previous sec- 
tion was estimated using ordinary least 
squares. In this section, we discuss these 
results, focusing first on the estimates of 
Equations 1 and 2 and then on Equation 3. 


Knowledge and Reward 


Our estimates of the knowledge and grade 
equations are presented in Tables 1 and 2. 

Previous economics. It appears that 
having had an economics course prior to the 
college principles course has at best no effect 
on a student's knowledge or grade in the 
principles course. Having had previous 
economics may even have had an adverse 
effect on both KNOW and GRADE. In Re- 
gression 1, the coefficient on the dummy 
variable “no previous economics" is positive 
but insignificant; while in Regression 2, the 
coefficient of the same variable is positive 
and significant at the 1096 level. Since 
nearly all of those students who said they 
had economics prior to the principles course 
had such a course in secondary school, these 
results may shed some light on the teaching 
and learning of secondary-school economics. 
A student may take a high school course that 
is called an economics course, but which, in 
fact, emphasizes consumer economics, stock 


4 Course grades are assigned on a numerical scale, 
with 90-100 = A+, 80-89 = A, 70-79 = B, 60-69 = C, 
and 50-59 = D. 

5 What we are really interested in, of course, is the 
students’ perception of instructor leniency. Because 
perceived leniency may not be closely related to final 
grades in the course, in estimating Equation 2 we use 
each student's grade in the course just prior to the time 
the evaluations were conducted. The teaching evalu- 
ations were carried out approximately 21/ weeks prior 
to the end of the term's lectures. AY 

$ There are only 10 questions used on the university § 
evaluation form; for decision-making purposes, 24” 
ministrators concentrate on the students’ responses i 
just this question on overall effectiveness. r 

7 In the past, students at the university, fearing p 
prisals from their instructors, refused to identity 
themselves with their student numbers on evaluation 
forms. This resulted in a high incidence of invalid m 
sponses, and the solicitation of student number V? 
abandoned in 1974. "he use of section averages âi 
grouped dummy variables rather than individuale 
servations may even be statistically better to the ex' 
that grouping the data masks errors-in-varl@! 
problems (see Beals, 1972). 
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Table 1 
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Regression Results for Equations 1 and 2 


Equation 1 (knowledge) Equation 2 (grade) 
Variable Coefficient Standard error Coefficient Standard error 
No previous economics .250 243 1.91 1.02 
No calculus —.138 -228 —4.20 .962 
High school grade average 
A 2.20 .348 10.7 1.51 
B 149 -264 3.10 112 
D 1.55 .931 —3.62 3.93 
Previous college upper-class 
grade average 
A 1.28 -730 13.38 3,08 
B 1.66 -466 6.21 1.98 
c .115 .472 —3.66 1.99 
D .455 .883 925 3.72 
3rd-year student® .026 573 2.08 2.41 
Female? —.240 .249 1.35 1.05 
Pretest score 448 044 
Posttest score 1.75 -160 
Intercept 5.87 651 41.3 2.82 
R? .290 .396 
^12 yes, 


markets, real estate, and so on and tends to 
skim over material traditionally taught in a 
college economics course. The resemblance 
1s not strong enough to help students per- 
form better in the college course and may 
even result in confusing them. Another 
possibility is that students are taught a 
Principles course in high school and are 
taught badly. Alternatively, students may 
arrive in the college course with some 
knowledge but a false sense of having already 
mastered the material. In any case, their 
performance in the college course could be 
adversely affected. 

Academic average. Students with aca- 
demic averages of A or B (prior to enrolling 
in the principles course) both did better on 


. Our posttest and had higher November 


grades than those with lower averages.8 
Ubper-class A and B students appear to get 
ue grades than first-year students in 
eir section with similar knowledge and 
rie background. This is probably due 
d oe eres in the university (an 
ge In college generally represents 

iua RA better performance than it does 
ms condary school) and to the greater ex- 
ence upper-class students have in taking 


college-level exams. Somewhat surprising 
is the insignificant coefficient of an AAU of 
Grade A in Regression 1: Upper-class stu- 
dents with an A average do not know signif- 
icantly more economics at term’s end than 
do first-year students with a high school C. 
Yet, the more senior A student can expect a 
considerably higher grade (more than 13 
percentage points higher) in the course than 
a first-year student in his or her section with 
a C average and the same knowledge! 
Ability in writing college-level exams ap- 
pears to be handsomely rewarded. 

Sex of student. Aninteresting nonresult 
is the fact that male and female students of 
like background do not differ significantly 
in their performance either on the posttest 


8 Something of a puzzle is the positive and significant 
(at the 10% level) coefficient of an AAF of Grade D in 
Regression 1. We have no entirely convincing expla- 
nation why first-year students coming in with a D av- 
erage should do 1.55 points better, ceteris paribus, on 
the posttest than otherwise similar students with C 
averages. Perhaps, being underdogs, they tried harder. 
At any rate, those in the AAF of Grade D category rep- 
resent a very small fraction (1.3%) of our sample. This 
result may therefore be due to extraordinary perfor- 
mance by two or three students. 
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Table 2 n 
Instructor Coefficients MM 
Equation 1 (knowledge) j Equation 2 ues dex 
Instructor Coefficient Standard error Coefficient andard erroi 
.520 3.01 2.20 
P y .584 3.95 gas 
4 1.06 .536 5.18 2.26 
5 2.02 .603 3.92 2.56 
6 140 .612 2.92 ee 
T 2.06 .521 .843 ae 
8 115 .559 3.37 ai 
9 649 575 8.96 eee 
10 2.95 514 .188 2d 
1i 188 534 10.9 2.25 
12 1.09 .537 4.95 2.27 
13 1.78 -660 5.26 2.79 


M 1.75 .595 


A43 


pmi uer uec cA M m Mello. CONUM tA euis cM M ME 


Note, ‘The contributions of the other instructors to their student's knowledge and their leniency in grading were measured relative 
to Instructor 1. Therefore, there are no coefficient estimates for him. 


or in the course itself. Coefficients on FEM 
were insignificant in both regressions.? 

Calculus background, Students who had 
had no calculus course did slightly (but sta- 
tistically significantly) worse on our posttest 
than did students who had had or were con- 
currently enrolled in a term or more of cal- 
culus, 

Our posttest attempts to measure pri- 

marily knowledge of and ability to deal with 
basic economic concepts and does not reward 
analytical ability per se. Lectures and 
course tests, on the other hand, may be more 
directly concerned with the manipulation of 
tools of analysis and hence reward more 
highly those who have greater exposure to 
calculus, even though calculus is not explic- 
itly required in handling the problems. 
Although those without a calculus back- 
ground appeared to have approximately the 
same knowledge of economic concepts as 
their more numerate classmates, they 
seemed to be at a disadvantage in the course 
exams and assignments. 

Time and size of class. These variables 
were dropped from the regressions by our 
regression package. (None of the coeffi- 
cients associated with any of the time and 
size variables was significantly different from 
zero at the 99.999% level.) 

Pretest and posttest. Although we ex- 
perimented statistically using a series of 


dummy variables with nonlinear relation- 
ships between the posttest and the pretest 
and between the November grade and the 
posttest, the results did not differ noticeably 
from those reported in Table 1. 
Instructors and knowledge. The ceoffi- 
cients of INST in Regression 1, shown in 
Table 2, give us our measure of each in- 
structor’s contribution to student knowl- 
edge. The omitted instructor is Instructor 
1; since all other coefficients are positive, his 
is the least contribution. The contribution 
to knowledge (or value added) by Instructors 
3, 9, and 11 is not significantly greater than 
his. At the other end of the scale is In- 
structor 10, whose students can be expec t- 
to score nearly 3 points higher on the pos 
test than students of Instructor 1, even when 
possible differences in class composition E 
80 on are controlled for. (A difference 9 : 
points on a 19-question test is quite 8U 
stantial; the difference between the ME A 
posttest mean score and the overall prete 
mean was about 4.3 points.) pa 
Instructors and leniency. In 
appear to differ substantially in their s 
ality in grading. From the coefficients © à 
instructor variables in Regression 2, We ?' 


P tors 
3 It might be noted that all but one of the instru 


len! 
in our sample were male, while 28.996 of the stu 
were female. 


Ss ET om 
0 ———————————— 


l«——————————— 


LENIENCY, LEARNING, AND EVALUATIONS 


that Instructor 1, the reference instructor, is 
the toughest grader. Several instructors are 
not significantly more lenient than he is. 
But a student of a given background, with a 
given level of knowledge of economics, could 
expect to receive a grade from 5 to 11 points 
higher from some of the other instruc- 
tors.!? 


Value Added, Leniency, and Evaluations 


Having arrived at measures of each in- 
structor’s contribution to students’ knowl- 
edge and his or her leniency in grading, we 
are now in a position to confront the central 
question of this study: 'To what extent are 
instructor leniency and value added corre- 
lated with high evaluations? Our measure 
of contribution to knowledge (CONTRIB) is 
the set of estimated coefficients for the in- 
structors from Regression 1; our measure of 
leniency (LEN) is the set of estimated coef- 
ficients for the instructors from Regression 
2. When Æ, the section mean responses to 
the overall effectiveness question, is re- 
gressed on these variables plus an intercept 
term, the result is the following: 


| E= 3.87 — 0.125 CONTRIB 
(.465) (.215) 
— .086 LEN, 

(.56) 


(The numbers in parentheses are standard 
errors. Both coefficients are quite close to 
and not significantly different from zero. 
Apparently, neither leniency in grading nor 
contribution to students' knowledge has 
appreciable influence on what students 
consider effective teaching.!! In order to 
correct for what may have been a subjective 
response by students to instructors with a 
foreign (i.e., non-North American) accent, 
we reestimated Equation 3, including the 
dummy variable FOR (whose value is 1 for 
instructors whose mother tongue was not 
English): 


E- 3.37 — .086 CONTRIB 
(.328) (.153) 


2- 
R? = .185. (3a) 


— .022 LEN — .869 
(.044) (.248) 


FOR, R? = 633. (3b) 
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Although inclusion of FOR substantially 
improves the fit of the evaluation equation, 
the impact of CONTRIB and LEN becomes 
even smaller. 

These results, along with various other 
tests of the robustness of Regressions 3a and 
3b,!? suggest that in evaluating an instruc- 
tor's overall effectiveness, students are not 
primarily (or even strongly) responsive ei- 
ther to the instructor's ability in developing 
their knowledge of economics or to the se- 
verity of the instructor's grading of student 
performance. 


Concluding Remarks 


If, as our results suggest, evaluations do 
not depend on leniency or learning, why have 
some other studies found a positive rela- 
tionship between grades and evaluations? 
Two possibilities immediately come to mind: 
First, the students and instructors in dif- 
ferent studies are not random samples from 


10 The following are some interesting sidelights: The 
instructor with the greatest contribution to student 
knowledge (Instructor 10) is one of the least lenient, 
while the instructor with the least, value added (In- 
structor one) is also one of the least lenient. ‘The most 
lenient instructor (Instructor 11) has a value added not. 
significantly greater than that of the reference in- 
structor. 

11 Similar regressions using as the dependent variable 
the instructor's average evaluation on the 9 other 
questions yielded similar results. This is to be ex- 
pected, since simple correlations among all 10 questions 
were very high. We used scores on Question 10 alone 
because that measure plays a role in determining merit 
pay and promotions at the University of Western On- 
tario. 

1? Other independent variables that might influence 
student ratings of instructor effectiveness are the 
teaching experience and the sex of the instructor, We 
reran Regressions 4 and 4a with variables accounting 
for each instructor's total previous teaching experience, 
previous principles teaching experience, or the square 
roots of each of these, with no changes in the results 
reported above. Size of class and the time the class met 
were also insignificant. We could not include a dummy 
variable for sex of the instructor because we had only 
one female instructor in our sample. ‘The results were 
also unchanged when we dropped Instructor 1 or 11 
(both outliers in some sense) or all instructors with a 
foreign accent from our sample. None of the instructors 
in our sample was French Canadian, and none had 
British, Irish, or Australian accents. There was also 
virtually no change in the results when we used final 
term grades to estimate the leniency of the instruc- 


tors. 
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the entire respectivy pulations.! Second, 
in the studies that Used individual data in- 


stead of section averages, the observed re- 
sults may be picking up the possibility that 
those instructors taught primarily to the 
brighter students (who consequently re- 
ceived higher grades). Such behavior would 
have been masked by our use of section av- 
erages. ` 
We would like to stress that we have not 
attempted in this study to capture all of the 
factors that determine evaluations; we have 
not attempted, in other words, to estimate 
the equation that best predicts E. What is 
` being measured by student evaluations of 
teaching effectiveness remains an open 
question and a disturbing one. Our findings 
lead us to believe that students may evaluate 
instructors on the basis of somewhat sub- 
jective feelings that are not related in any 
direct way either to the grades they receive 
or to how much they learn from the instruc- 
tor. First-year students (who comprise 
81.296 of our sample) may be particularly 
sensitive to instructor characteristics that 
help make their transition from high school 
to the university less painful. Such char- 
acteristics may bear little relationship to 
leniency in grading or ability to convey 
knowledge of the subject.!4 
i While we place a great deal of confidence 
in our results, we should emphasize that they 
have been obtained from 14 sections of one 
beginning course in one department in one 
university. The results might be different 
for a different department, for students 
taking an upper-level course, for different 
types of students, or for instructors at dif- 


13 Because Murray (Note 4) has found results similar 

to ours using a different. methodology in beginning 

psychology Courses, one suspects that different schools 

do indeed have student bodies and faculties that differ 
markedly in motivation, interest, and ability. Sullivan 
(Note 5) has suggested to us that one possible expla- 
nation of the difference between our results and those 
of Sullivan and Skanes (1974) is that in their sample, 

instructors were rewarded according to how much their 
students learned; whereas in our sample and in Mur- 
pay a puch was not the case, 

0 the extent that upper-class student 

the adjustment to the university, we wae, du 
to respond somewhat differently. If evaluations were 
available on an individual student basis, tests of this 
hypothesis would be most interesting. 


J. PALMER, G. CARLINER, AND T. ROMER 


ferent universities. To see if these results | 
apply in other settings, we encourage those 
interested in pursuing the question further 
to adopt the approach we have used and to 
measure learning and leniency as accurately 
as possible. A 
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Cognitive Style and Teacher-Student Compatibility | 
Janis Packer and John D. Bain 


University of Queensland, St. Lucia, Australia 


"This research studied the effects of cognitive style matching in 32 teacher-stu- 
dent pairs. Tests of two cognitive style dimensions—serialism-holism and 
field dependence-independence—were administered to 54 final-year trainee 
mathematics teachers and 58 first-year psychology students. From these, 32 
teacher-student pairs were formed so that teachers and students were 
matched or mismatched on one or both cognitive style dimensions, in con- 
formity with a 2 X 2 X 2 X 2 experimental design. Teachers structured and 
taught to their partners a 30- to 40-minute lesson on the mathematical concept 
of network tracing. Students were then independently examined. Matching 


effects were obtained on objective test performance and on teachers' and stu- 
dents' subjective ratings of each other at the extremes of the field depen- 


dence-independence dimension. 


Educational researchers have frequently 
acknowledged the need to adapt instruc- 
tional conditions and techniques to charac- 
teristics of the individual student. Much of 
the investigation of this question has been 
performed under Cronbach's (1967) “apti- 
tude-treatment interaction” (ATI) model, 
which has as its goal the discovery of signif- 
icant interactions between aptitudes and 
alternative treatments, on the basis of which 
students are differentially assigned to opti- 
mal instructional conditions. 

A theoretically based approach to ATI 
research has recently been proposed as an 
alternative to the trial-and-error method 
used in many previous investigations. The 
ATI approach (Allen, 1975; Di Vesta, 1975; 
Salomon, 1972) takes into consideration the 
cognitive processes assumed to be correlated 
with or measured by particular aptitude 
variables and considers the processes pro- 
duced or favored by particular treatments. 

Potential aptitude variables include the 
following three classes identified by Snow 
and Salomon (1968): (a) specific intellectual 
abilities, (b) specific personality traits, and 
(c) general cognitive styles and -learning 

strategies. Cognitive styles, in particular, 
hold a great deal of promise for the theoret- 
ically based approach to ATI research, since 
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they are directly related to the cognitive 
processes of the learner and are consistent, 
across and relevant to a large number of 
different tasks. Moreover, cognitive styles 
may provide broad and effective discrimi- 
nations between individuals that extend 
across the perceptual, intellectual, person- 
ality, and social domains. 

According to the theoretically based ap- 
proach, in order to obtain an AT' with cog- 
nitive style as the aptitude variable, alter- 
native instructional treatments should be 
devised that compensate for the weaknesses 
and/or capitalize on the strengths of stu- 
dents with different cognitive styles. 
However, when these treatments are applied 
in the classroom situation, the teacher may | 
be an important modifying variable. 
Characteristics of teachers, in particular 
their cognitive styles, may influence their 
proficiency with different types of prescribed 
instructional treatments, may influence their 
choices of instructional approaches, or may 
influence their incidental behaviors in away 
that interacts directly with characteristics 
of students. 

The present study was concerned broadly 


with the last two of these possibilities. TW? 


cognitive style variables that had previous'Y 
been shown to be fruitful in ATI researc 
were selected for investigation of teacher. 
student compatibility effects. "These p 
serialism-holism (Pask, 1969, 1975) and fie 
dependence-independence (Witkin et 4 , 
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1962). 
Pask's (1969, 1975) distinction is between 


two categories of mental competence— 
"serialist" and “holist”—each of which leads 
an individual to consistently favor a partic- 
ular strategy in a variety of learning situa- 
tions. A serialist individual approaches a 
learning or problem-solving task in small, 
well-defined, and sequentially ordered seg- 
ments, related by simple links, and learns 
more effectively when material is presented 
in this way. A holist individual adopts a 
more global approach, learning about several 
topics at once, relating them with complex 
links, and learning more effectively when 
material is presented this way. The dis- 
tinction has predicted performance in sev- 
eral complex problem-solving tasks (Pask, 
1975; Pask & Scott, 1972). 

The field dependence-independence di- 
mension was originally defined in the task of 
locating the upright in space, on the basis of 
subjects' differential reliance on information 
from the visual environment (field depen- 
dent) as opposed to sensations from within 
the body (field independent). It has since 
o found to apply to a wide range of other 
tasks. 

Witkin, Moore, Goodenough, and Cox 
(1977) review the implications of field-de- 
pendent and field-independent cognitive 
styles for the provision of alternative in- 
structional treatments. They conclude that 
in comparison to field-independent students, 
field-dependent students are (a) more at- 
tentive to and therefore better at remem- 
bering social material; (b) more reliant on 
external references and therefore more likely 
to require externally defined goals and re- 
inforcements; (c) less likely to make use of 
Mediational processes such as analyzing, 
structuring, abstracting, and general prin- 


ciples; and (d) more likely to have difficulty 
in accepting the irrelevance of salient at- 
tributes in concept learning. 
A Witkin et al. (1977) also present some ev- 
"c regarding the effects of teachers' 
Bre style. _ Field-dependent teachers 
OW strengths in establishing a warm and 
Personal learning environment and in en- 
couraging student participation in setting 
goals and directing learning. In contrast, 


field; h 
eld-independent teachers are more ori- 
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ented toward the cognitive aspects of 
teaching and show strengths in the organi- 
zation and guidance of learning. "Thus, 
field-dependent teachers tend to favor dis- 
cussion methods, while field-independent 
teachers favor lecture methods. Matching 
teachers with students on their level of field 
dependence-independence has been shown 
to result in greater interpersonal attraction 
and more favorable subjective ratings of 
students by teachers than mismatching. 

The aims of the present study were to in- 
vestigate the effects of teacher-student 
cognitive style matching and mismatching 
on both interpersonal ratings and objective 
measures of student learning, to evaluate 
and compare the contributions of two dif- 
ferent cognitive style dimensions in pro- 
ducing such effects, and to determine 
whether these effects (if found) are mediated 
by definable differences in teaching strate- 
gies. 

Teachers and students were assigned to- 
gether in pairs on the basis of a2 X 2X 2X 
2 experimental design that varied the 
teachers' cognitive style on two dimensions, 
each at two levels (field dependent vs. field 
independent and serialist vs. holist), and the 
students' cognitive style on the same di- 
mensions (field dependent vs. field inde- 
pendent and serialist vs. holist; see Table 1 
for clarification of the basic design). 

Dependent variables included the fol- 
lowing: (a) students’ scores on two tests of 
the material taught, one immediately fol- 
lowing and one a week after the teaching 
session; (b) teachers' and students' subjec- 
tive evaluations of the ease of teaching or 
learning from their partners; and (c) teach- 
ers’ written lesson plans, analyzed according 
to teaching strategy. Significant two-way 
interactions were predicted between teach- 
ers' cognitive style and students' cognitive 
style, on either or both dimensions, for stu- 
dent test scores and teacher-student evalu- 
ations, and significant main effects of 
teachers' cognitive style were predicted for 
teacher strategy. 


Method 


Subjects 


Tests of cognitive style were administered to 58 
first-year psychology students and 54 final-year trainee 
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mathematics teachers. On the basis of their scores, 32 
subjects from each group were selected to participate 
in the experiment, the former as students (12 males and 
20 females) and the latter as teachers (21 males and 11 
females). Most participated in order to fulfill course 
requirements. 


Materials 


Tests of cognitive style. Field dependence-inde- 
pendence was assessed by means of the Group Em- 
bedded Figures Test (GEFT; Dumsha, Minard, & 
McWilliams, 1973). Serialism-holism was assessed by 
means of a version of the Grandlemuller Test (Pask, 
1975, p. 108), which was modified to enable group ad- 
ministration. 

Instructional materials. The mathematical concept 

of network tracing was selected as the topic to be taught 
by teachers to their student partners. This topic ful- 
filled the two major criteria that (a) the topic would 
most likely be unfamiliar to the psychology students 
and (b) the topic could be learned in a brief teaching 
session of 30 to 40 minutes. Although it was anticipated 
that this topic might favor field independents, the 
compatibility interaction remained the main variation 
of interest. Two different approaches to the teaching 
of the network tracing concept have been presented by 
Arnold (1962) and Niman (1975). Arnold takes a quite 
formal approach, proceeding by definition of terms and 
proof of theorems. Niman, after initial definition of 
terms, proceeds by involving the student in the dis- 
covery of relationships. Both of these were provided 
to the teachers to afford them sufficient diversity of 
approach from which they could select suitable strate- 
gies. 

‘Two tests of the material were devised by experienced 
teachers—one to be administered immediately after 
learning and one for use as a retest, Only those aspects 
of the topic discussed by both Arnold and Niman were 
included in these tests, 


Procedure 


Cognitive style tests were administered to the psy- 
chology students in groups of 10 to 12 and to the trainee 
teachers in the total group of 54. The GEFT was pre- 
sented first and the Gandlemuller Test second. There 
was a slight positive correlation (r = -28) between the 
two tests in the total sample. 

Thirty-two students and 32 teachers were then se- 
lected so that extremeness of scores on both cognitive 
style dimensions was maximized. (Some middle-range 
scorers had to be included to make up the numbers 
however.) These subjects were then assigned to 32 
teacher-student pairs that complied with the 2 x 2 x 
2 X 2 underlying experimental design; that is, each of 
the 16 possible combinations of teacher's and student's 
cognitive style was represented by 2 teacher-student 
pairs, As a partial solution to the problem of having to 
include middle-range scorers, the first 16 pairs formed 
(1 pair per cell) were the best. possible pairs in terms of 
extremeness of scores and degree of match or mismatch 
within pairs. A second set of 16 pairs that did not. fulfill 
these requirements quite as well was then formed from 
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the remaining subjects. As will be noted later, analyses 
were based on collapsed versions of the basic design, 

Those subjects participating as teachers were given 
copies of the two network tracing articles described 
above (Arnold, 1962, chap. 3; Niman, 1975). They were 
instructed to study both of these articles and to for- 
mulate their own lesson plan, which they would later 
follow in teaching the concept to a psychology stu- 
dent. 

The actual teaching sessions were conducted 1 week 
later, on a one-to-one basis, without observers, 
Teachers were allowed up to 40 minutes with their 
students, at which time teachers were asked to finish as 
soon as possible (most teachers finished within 30 to 35 
minutes). Teachers and students were then separated 
and asked to evaluate their partners’ performance in 
terms of how easy each was to teach or to learn from. In 
addition, teachers were asked to estimate their students’ 
performance on a test of the topic (before seeing the test 
itself). 

"The first test was administered immediately upon the 
students' completion of the evaluation. Students were 
then requested to return a week later, but they were not 
informed that they would be given a retest. 


Results and Discussion 


Basic descriptive data under the four-way 
design are presented for the total sample in 
Table 1. For the purpose of analysis, the 
four-way design was collapsed into two 
two-way designs, each of which varied 
teacher's and student's cognitive style on 
only one dimension, that of field depen- 
dence-independence or serialism-holism. 
"Therefore, in each of the two-way designs, 
there were four possible combinations of 
teacher's and student's cognitive style, each 
represented by eight teacher-student pairs, 
four from the first set (extreme scorers) plus 
four from the second set (middle-range 
Scorers). 

It should be noted that 1 of the original 32 
pairs was replaced because the teacher faile 
to prepare a lesson, and 1 pair was remove 
from the first set of 16 to the second set be- 
cause the student failed to return for the 
retest, and the score eventually obtained was 
of doubtful validity. 

The effects of teacher-student match- 
mismatch were investigated on a number 
dependent variables: (a) students’ perfor- 
mance ona test of the concept immediate 
following the teaching session, (b) students 
performance on a similar test 1 week after 
the session, and (c) teachers’ and students, 
subjective evaluations of their par i, 
learning or teaching efficiency. The effe 


a 


TEACHER-STUDENT COMPATIBILITY 


Table 1 


867 


Means and Standard Deviations on Objective Tests for the Total Sample (Four-Way Design) 


Student’s cognitive style 


Teacher’s Field dependent Field independent 
cognitive style Serialist Holist Serialist Holist 
Immediate test 
Field dependent 
Serialist 
M 19.5 22.0 22.5 23.5 
o 2.12 1.41 otf ai 
Holist 
M 18.5 23.0 22.0 22.5 
c 6.36 0 141 3.54 
Field independent 
Serialist 
M 19.0 24.0 22.0 17.5 
c 2.83 1.41 4.24 10.61 
Holist 
M 15.5 18.0 22.0 20.0 
o 2.12 9.90 1.41 7.07 
3 Retest 
Field dependent 
Serialist. 
M 12.0 20.5 22.5 19.5 
Bi 5.66 2.12 71 2.12 
Holist 
M 18.5 21.5 14.0 20.0 
2 En m 5.66 0 
Field independent 
Serialist 
M 18.0 17.5 19.5 14.0 
e 2.83 3.54 4.95 11.31 
Holist 
M 14.5 18.5 215 15.0 
S 4.95 4.95 NH 8.49 


ofthe teachers' cognitive style on the struc- 
ture and content of their lesson plans was 
also investigated. 


Objective Tests 


Immediate test. For the total group of 32 
aT none of the main or interaction effects 
dcn ed Significance. Even for the 16 most 
A NER pairs, there were no significant ef- 

cts under the serialism-holism design. 
field aa for the 16 extreme pairs under the 

ae ependence-independence design, 
de Was a significant main effect of stu- 
5 “8 cognitive style, F(1, 12) = 8.80, p < .05 
om and a significant interaction be- 
teacher's cognitive style and student's 


cognitive style, F(1, 12) = 5.97, p < .05 (W? = 
.16). Analysis of simple main effects under 
field-independent teachers revealed that 
field-independent students performed sig- 
nificantly better than did field-dependent 
students, F(1, 12) = 14.63, p < .005, whereas 
there was no difference under field-depen- 
dent teachers, F(1, 12) = .137. Stated al- 
ternatively in terms of the effects of different 
teachers, field-dependent teachers were 
significantly more effective than field-in- 
dependent teachers with field-dependent 
students, F(1, 12) = 9.52, p < .01, but not 
with field-independent students, F(1, 12) = 
.137. This interaction is displayed graphi- 
cally in Figure 1A, and means and standard 
deviations are reported in Table 2. 
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Retest. Overall means were lower on the 
second test than on the first, but the corre- 
lation between the two tests, which were 
designed to be parallel, was fairly high, r(30) 
= +.71, p € .01. Again, the only effects to 
reach significance were in the field depen- 
dence-independence, most-extreme-groups 
design: the main effect of student's cogni- 
tive style, F(1, 12) = 14.07, p < .005 (w? = 
.32), and the interaction effect, F(1, 12) = 
12.31, p < .005 (W? = .27). The interaction 
in this case was disordinal, by Bracht’s 
(1970) definition. As illustrated in Figure 
1B, the differences between alternative 
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Table 2 
Means and Standard Deviations on Objective 
Tests for the Most Extreme Sample 
(Collapsed Field Dependence-Independence | 


Design) | 


"Teacher's Student's cognitive style 
cognitive Field Field 
style dependent independent 
Immediate test 
Field dependent 
M 22.5 23.25 
o 1.0 1.26 
Field independent 
M 16.25 24.0 
c 5.12 2.0 
Retest 
Field dependent 
M 19.25 19.5 
o 2.75 1.91 
Field independent 
M 14.25 21,75 
c 2.22 0.96 


treatments (field-dependent vs. field-inde- 
pendent teacher) at two levels of the apti- 
tude variable (field-dependent vs. field in- 
dependent student) are both significantly 
nonzero and different in algebraic sign: For 
field-dependent students, field-dependent 
teachers were superior to field-independent 
teachers, £(6) = 2.83, p < .05; for field-in- 
dependent students, field-independent 
teachers were superior to field-dependent 
teachers, £(6) = —2.10, p <.05. Means and 
standard deviations are reported in Table 


2. 

These results suggest that the previously 
reported disadvantage of field dependents 
in the mathematical domain may be të 
moved or lessened by assigning them t 
teachers who are also field dependent. be 
though the teacher’s cognitive style see™ 
to be less influential in the performance i 
field-independent students, it may be the 
on a task in which such students are be i 
vantaged, the students also would benet! 


greatly from the matching approach. 


Subjective Evaluations 


3 : ion. 
Immediately after the teaching sessio 
students were asked to evaluate the 
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with which they had learned the topic from 
their particular teacher, and teachers were 
asked to evaluate how easy the student was 
to teach. Four response categories were 
provided: very easy, easy, difficult, and very 
difficult. Only 2 of the 64 evaluations fell 
outside the first two categories; so to simplify 
calculations, this variable will be considered 
as dichotomous, one response (“very easy") 
being positive and the other (“easy”) being 
neutral. The two responses not included in 
these categories (one given by the teacher 
and one by the student of a single pair) will 
be scored as neutral for the sake of conve- 
nience. 

The set of teachers' and the set of stu- 
dents' responses were each arranged in terms 
of two three-way contingency tables, scored 
separately for field dependence-indepen- 
dence and serialism-holism: — Teacher's 
Cognitive Style X Student's Cognitive Style 
X Positive Versus Neutral Response. These 
were analyzed by means of the G test (Sokal 
& Rohlf, 1969). 

No significant effects emerged on teach- 
ers' ratings of students. However, when 
students’ ratings of teachers were consid- 
ered, significant effects were obtained under 
the field dependence-independence design. 
The test. of independence between teachers' 
cognitive style and positive versus neutral 
response was significant, G(1) = 4.97, p < 
05, indicating that students’ evaluations of 
teachers were influenced by the teachers’ 
Cognitive style, more positive ratings being 
given to field dependents than to field in- 
dependents. 

Under this model, the effect of cognitive 
style match-mismatch is conveyed by the 
three-factor interaction, which in this case 

id not reach significance, G(1) = 2.35, p > 
40. Further analysis indicates, however, 
that the tendency of students to give more 
ve ratings to field-dependent than to 

leld-independent teachers is due almost 
muy to the field-dependent students as 
Shown by teachers' cognitive style related to 
a vs. neutral response: For field- 
fida ent students, G(1) — 6.90, p « .05; for 
A independent students, G (1) = .41, p > 
epee 2). This finding parallels the 
Ci on effect obtained on students' test 

es in that the teachers’ cognitive style 
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Figure 2, Interaction between students’ and teachers’ 
cognitive style on students’ evaluations of teachers. 
(FD = field dependent; FI = field independent.) 


had greater influence over both the perfor- 
mance and evaluations of field-dependent 
than of field-independent students. Again, 
task variables may be involved. 
Estimations of student performance. On 
completion of the teaching sessions, teachers 
were also asked to predict their students’ test 
performance (before seeing the actual test). 
It is conceivable that cognitive style 
match-mismatch effects may influence ei- 
ther these estimations themselves (i.e., their 
favorability) or the discrepancy between 
predicted and actual student performance 
(i.e., their accuracy). There were no signif- 
icant effects on the estimations themselves, 
but when accuracy was used as the depen- 
dent variable, a significant interaction effect 
did emerge under the field dependence- 
independence, most-extreme-groups design, 
F(1, 12) = 7.16, p € .05 (w? = .28), teachers 
being more accurate in their predictions 
when matched with their students than 
when mismatched. This interaction is il- 
lustrated in Figure 3. Field-independent 
teachers were significantly more accurate 
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Figure 3. Interaction between most extreme students’ 
and teachers! cognitive styles on accuracy of teachers’ 
predictions of student performance. (FD = field de- 
pendent; FI = field independent.) 


with field-independent than with field- 
dependent students, F(1, 12) = 7.61, p < .05. 
Field-dependent teachers were more accu- 
rate with field-dependent than with field- 
independent students but not significantly 
so, F(1, 12) = 1.05. This finding again par- 
allels the results obtained on objective test 
data in that field-independent teachers, but 
not field-dependent teachers, were more 
effective with and more accurate in pre- 
dicting the scores of students with whom 
they were matched and Suggests that field- 
independent teachers may be less aware of, 
or less able to adapt to, cognitive style dif- 
ferences than field-dependent teachers. 


Lesson Plans 


Teachers were given two articles, each 
taking a different approach to the concept of 
network tracing, and were asked to formu- 
late their own lesson plans. These written 
plans were collected after the teaching ses- 
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sion and were analyzed according to the 
following criteria: (a) which article the plan 
was more closely related to, (b) amount of 
restructuring imposed on the presentation, 
(c) type of restructuring imposed (if any), 
and (d) amount of content included. 

No significant differences were obtained 
between field-dependent and field-inde- 
pendent teachers or between serialist and 
holist teachers on any of these criteria. The 
tendency of teachers to follow the standard 
approaches fairly closely, together with the 
much greater popularity of the Niman arti- 
cle, may have inhibited the mediation of 
cognitive style effects through the teachers’ 
choice of broad instructional strategies or 
sequencing approaches. It is conceivable, 
however, that matching effects may have 
been mediated by plans and strategies not 
manifest in the written lesson plans. 


Conclusions 


, 


Tt has been demonstrated that students 
objective learning performance and subjec- 


tive evaluations of the ease of learning, as - 


well as teachers’ ability to communicate with 


students and assess their progress, may - 


profit from the technique of cognitive style 
matching. On the basis of the present re- 
sults, however, these effects may be gener- 
alized only to students and teachers at the 
extremes of the field dependence-indepen- 
dence dimension. f 
The contrasting lack of effects when seri- 
alism-holism was used as the cognitive style 
dimension may be due to a loss of validity in 
its measurement as a result of modifications 
to the original test or may be due to factors 
inherent in the dimensions themselves that 
render field dependence-independence more 
suitable for the matching approach. Thisis 
closely tied to the question, yet to be re- 
solved, of the processes that mediate the 
matching effect. It appears that problems 
of communication in mismatched pairs, 1 
particular (for this task) between field 
dependent teachers and field-dependen 
students, may be an important factor; an í 
the possibility that differences in teachini 
strategies are involved has not been T of 
out. More detailed analyses are neede 


^is n 
the effects of teachers’ cognitive styles ° 


Ehag 


: 


actual classroom behaviors and teaching 
strategies in order to test these alterna- 


Cronbach's (1975) warning of the possi- 
bility of higher-order interactions should 
also be heeded. In addition, further research 
should be undertaken into the effects of 
different instructional topics, more pro- 
longed contact between teachers and stu- 
dents, and teacher-class rather than 
teacher-student encounters. 
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Doing Research the Hard Way: Substituting Analysis of 
Variance for a Problem in Correlational Analysis 


. . Lloyd G. Humphreys 
University of Illinois at Urbana-Champaign 


In a recent publication, Kirby and Das dichotomized two measures of individ- 
ual differences at medians and thereafter treated these measures as if they 
were independent variables in an analysis of variance. If instead they had an- 
alyzed their various measures by means of traditional correlational analysis, 
they would have had much more powerful tests of their hypotheses. They 
would also, in all probability, have been less inclined to interpret their results 
as if the dichotomized variables represented independent, causal antecedents 


of their various measures of intelligence. 


A recent article by Kirby and Das (1977) 
is an excellent example of the misapplication 
of the analysis of variance to research in in- 
dividual differences. At best, these 
misapplications result in less powerful tests 
of hypotheses than can be obtained from the 
intercorrelations of dependent and so-called 
independent variables. At worst, the results 
are misinterpreted by both authors and 
readers. 

The basic fact is that a measure of indi- 
vidual differences is not an independent 
variable, and it does not become one by 
categorizing the scores and treating the 
categories as if they defined a variable under 
experimental control in a factorially de- 
Signed analysis of variance. In the first 
place, it is impossible to assign subjects at 
random to levels of the “treatment.” Sec- 
ond, the groups formed in this way will differ 
from each other on all individual-differences 
measures correlated with the one catego- 
tized. There is no adequate method of 
controlling these differences on correlated 
Measures. Third, if more than one indi- 
Vidual-differences measure is categorized 
and used in a factorial design, the relation- 
ships with the dependent variable will be 
distorted unless the Ns in the cells reflect the 
correlations among the “independent” 
variables and unless a regression analysis is 
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substituted for an analysis that assumes an 
orthogonal design. The amount of distor- 
tion depends on the size of the intercorrela- 
tions and the manner in which the measures 
are categorized (see Humphreys & Fleish- 
man, 1974). 

Causal inferences are possible when a re- 
lationship has been established between a 
truly independent variable and a dependent 
variable, given random assignment and 
control of other stimulus and situational 
variables, but the preceding discussion 
makes clear that such inferences are no more 
justified for individual-differences variables 
by the analysis of variance than by analysis 
of intercorrelations. Since nothing magical 
happens when individual-differences mea- 
sures are categorized and treated as if they 
were independent variables, the analysis of 
variance merely provides an uncontrolled 
correlation between two or more categories 
and a continuous dependent variable. This 
uncontrolled correlation is substantially 
attenuated, and the power of the statistical 
test accordingly reduced, by converting a 
continuous or quasi-continuous measure to 
asmall number of categories. While power 
can be increased after categorizing by dis- 
carding a category or categories from the 
center of the distribution, the increase does 
not compensate for the initial loss. Also, as 
a side effect, the inflation of differences be- 
tween means when extreme groups are used 
frequently inflates the experimenter's eval- 
uation of the importance of those differ- 


ences. : ; 
To summarize the defects in the analysis 
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of variance design, primarily there is a sub- 
stantial loss in the power of the statistical 
test of the research hypothesis. There is 
often, as well, an illusion of control of vari- 
ables and a tendency to interpret significant 
relationships causally. Also, when ortho- 
gonality is imposed on nonorthogonal vari- 
ables, relationships are distorted. Itis valid 
to characterize the use of a factorial-designed 
analysis of variance for individual-differ- 
ences variables as an inadequate, cumber- 
some method of research design and data 
analysis. 

With one minor exception—an interac- 
tion—all of the hypotheses of Kirby and Das 
can be tested with the correlations reported 
in their Table 1. The analyses of variance 
are unnecessary, crude, and misleading. 
The one exception noted could easily have 
been added if a correlational analysis of the 
variables in the full range of scores had been 
planned from the outset. The addition 
would consist of the correlation between the 

product of the two “independent” variables 
and the several dependent variables. 
Before proceeding to compare the two 
methods of analyzing the data, it is necessary 
to recapitulate briefly a description of the 
measures used by these authors. They used 
three tests—Raven’s matrices, figure copy- 
ing, and memory for designs—to measure 
their construct of simultaneous processing. 
A second set of three tests—serial recall, vi- 
sual short-term memory, and digit span— 
measured their construct of successive pro- 
cessing. Two tests—word reading and 
Stroop’s color naming—were used to mea- 
sure “speed.” The definitions of the factors 
were verified by rotating principal compo- 
nents, and estimates of factor scores were 
obtained. Factor scores were dichotomized 
at medians for simultaneous and successive 
processing, and analyses of variance were 
computed for four dependent variables, 
‘These were reading vocabulary and reading 
comprehension from the Gates-MacGinitie 
test and verbal intelligence and nonverbal 
rin eee from the Lorge-Thorndike 
_ The reanalysis of Kirby and Das's data 
involved computing composite correlations 
NUT the three tests used to define each 
of the hypothetical constructs and the four 
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Table 1 


Correlations Computed in Two Ways Between 
the Factor Scores for Two Hypothetical 
Processes and Intellectual Test Scores 
Simultaneous Successive Inter- 
processing processing action 
ANO- Com- ANO- Com- ANO. 


Test VA posite VA posite VA 


Reading 

vocabu- 

lary 22 37 394 .50 04 
Reading 

compre- 

hension — .26 AQ Al 60 03 
Verbal 

intelli- 

gence 24 A8 29 52 05 
Nonverbal 

intelli- 

gence 28 52 29 48 01 


Note. ANOVA = analysis of variance. 

a Composite correlations could not be computed from published 
statistics for the interaction, but the correlations would be of 
trivial size. 


dependent variables. By assigning unit 
weights to each measure in the composite, it 
was possible to obtain all of the information 
needed from the intercorrelations in their 
Table 1. These correlations were then 
compared with the product-moment corre- 
lations computed from the F ratios! in their 
Table 6. 

The comparison of the two methods of 
analyzing the data are presented in our 
Table 1. It is immediately apparent that the 
composite correlations are substantially 
higher than those obtained in the analysis of 
variance. Since there is a common N for the 
correlations computed in the two ways, the 
procedure that produces the larger correla- 
tions constitutes ipso facto the more pow- 
erful method of analyzing the data. 


1 When there are only two levels of an independent 
variable, the product-moment correlation with the de" 
pendent variable is the square root of the ratio of the 
sum of squares for the independent variable to the tot 
sum of squares. The root of Hays’s omega squared is 
the population estimate of the same correlation. 
more than two levels are involved, the root of ome&? 


squared is the population estimate of eta, the nonlinea" 
correlation. 


There are two influences that attenuate 
the correlations derived from the analysis of 
variance approach. One of these is the loss 

reliable information from categorizing. 
he expected value of the product-moment 

elation obtained after dichotomizing at 
e median one of the two variables corre- 
ted is about four fifths of the correlation 
hetween the two continuous distributions. 
The second influence is the distortion in- 
troduced by treating correlated individual- 
differences variables as if they were orthog- 
onal. This effect can be approximated by 
the partial correlations between each of the 
“independent” variables and the dependent 
variable with the second “independent” 
variable held constant. Since the correla- 
lion between the two composites is .36, the 
second effect is one of appreciable size. 
After the dichotomization, the size of this 
correlation is expected to drop substantially; 

though in this instance, it drops more than 
expected. The product-moment correlation 
between the dichotomized independent 
ariables computed from the data reported 
in Kirby and Das’s article is .12. 

These authors also confuse Jensen’s two- 
level theory and general ability theory. Itis 
ot necessary, for example, to rotate away 

m a general factor in order to differentiate 
between the two theories. Jensen's theory 
(see Jensen, 1974) assumes that Level I and 
Level II abilities differ qualitatively. A step 
function between the two is involved. 
Blacks can equal or exceed whites in Level 
T but be substantially inferior in Level II. 
Level I abilities are a necessary but not suf- 
ficient condition for the development of 
Level II abilities. Level I abilities are more 
Primitive or elemental. Jensen would 
Identify his Level I, as a matter of fact, with 
the Kirby and Das factor of successive pro- 
Cessing—not with the general factor in their 
Intercorrelations. 

In contrast, a general factor theorist ex- 
ects to find measures of so-called Level I 
loading on the general factor and on one or 

More group factors. The general factor, not 
€ group factor of successive processing or 
evel I, depending upon one’s preference, is 
lasic or elemental. Also making contribu- 
“ons to the variances of Level I measures are 
Nonerror specifics and measurement error. 
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These measures differ from so-called Level 
II measures in the size of their loadings on 
the general factor, but they also have larger 
specifics. A composite of Level I measures 
has substantially higher loadings on the 
general factor than any one component. 
Because individual measures of rote learning 
(Level I) do not measure the general factor 
as validly as reading comprehension or ab- 
stract reasoning (Level II), however, groups 
that have different means of scores on the 
general factor will differ more on measures 
of complex functions than on measures of 
more specific functions. Differences are 
quantitative—a matter of degree, not of 
kind. 

It is also noteworthy that one does not 
disconfirm the presence of a general factor 
by more or less equalizing variance among 
three orthogonal factors. Orthogonality of 
the reference frame does not affect the 
dependencies among the measures. In the 
present case, three oblique first-order factors 
will define a second-order factor that is the 
general factor. Furthermore, this general 
factor defined by first-order factors can be 
transformed into a general factor defined by 
the tests by means of a simple transforma- 
tion (Schmid & Leiman, 1957). A general 
factor plus three group factors, or four if the 
dependent variables are included in the 
analysis, represent a very different model 
than the levels model of Jensen. 

Whether the definition of separate group 
factors is necessary or whether the general 
factor is sufficient for a particular purpose 
requires research that extends beyond the 
present data. There are no significant dif- 
ferences between the correlations of so-called 
simultaneous and successive processing with 
the reading and intelligence tests. Neither 
are there any significant differences as a 
function of the reading or intelligence test 
entering the correlations. A general factor 
can describe adequately all of the data pre- 
sented by Kirby and Das. Unless and until 
the measures of the hypothesized processing 
measures are shown to furnish differential 
information that is statistically significant, 
there is no need to measure them indepen- 
dently of the general factor in human abili- 
ties. Also, there is never any need to di- 
chotomize them and treat them as if they 
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weaken our findings. 


The reaction of Humphreys (1978) to the 
irby and Das (1977) article is a case dem- 
onstrating the fine points of statistical rea- 
wning by choosing a wrong example. Es- 
sentially, Humphreys’s technical comments 
‘on dichotomizing data are correct. His 
comments on Jensen’s Level I and Level II 
abilities, however, add confusion to the in- 
lerpretation of Jensen. I cannot deal with 
the issues on hierarchical abilities briefly. 
The reader is referred to Jarman (1978) and 
toa book on simultaneous-successive pro- 
ing by Das, Kirby, and Jarman (in press), 
which will be published shortly. My re- 
sponse will be restricted to the questions on 
design. 
Humphreys (1978) has shown that the 
analysis of variance technique gave a con- 
servative test of our data, thereby showing 
the “real” effect to be even stronger—which 
does not change our conclusion. He is right 
that the median-split technique should not 
be used generally, but he neglects to con- 
ider instances where it can be used with 
profit. His trepidations about causation are 
unwarranted, although some people think 
that analysis of variance gives causality. In 
short, most of what he says in this regard is 
Correct; but in his zeal, he has attributed 
Weaknesses to the Kirby and Das article that 
Onot exist. I shall elaborate on this in the 
remainder of our present article. 


Eus F. Jarman and Thomas O. Maguire have 
tio uted substantially to the first and second sec- 
ns of this article, 
Cent tests for reprints should be sent to J. P. Das, 
Al * for the Study of Mental Retardation, University 
rta, Edmonton, Alberta, Canada T5G 2G5. 
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A Reply to Humphreys 


John R. Kirby 
University of Newcastle 
Newcastle, Australia 


Humphrey’s comments on double-median splits are essentially correct, but 
these are not relevant to the Kirby and Das article. His comments do not 


Individual-Differences Variables and 
Causality 


Humphreys makes the point that an in- 
dividual-differences variable cannot be an 
independent variable in an experiment. 
However, we feel that it can still be a classi- 
fication variable in analyses of variance. We 
do not randomly assign children to age, but 
we use age as a classification variable in the 
analysis of variance. In fact, we dichotomize 
on age despite the fact that all children are 
on an age continuum. One issue is ran- 
domization of assignment, then, as a dis- 
tinction between experimental and nonex- 
perimental designs; the other issue is the 
breadth of the levels chosen in placing a 
continuous variable in distinct categories. 

Humphreys comments on attributing 
causality. Our attribution, which is not 
explicit, stems from the model of simulta- 
neous-successive processing (Das, Kirby, & 
Jarman, 1975) and not from the use of anal- 
ysis of variance. Further, his notion about 
causal inferences being possible from ran- 
dom assignments is true, but what he forgets 
is that they are also possible in other ways. 
For an extreme example, consider that 
planets go around the sun as a result of gra- 
vitational attraction, but there is no random 
assignment. 

Basically, there are two types of variables 
in our study: (a) some measures of specific 
processes that do not rely heavily on prior 
learning (simultaneous-successive) and (b) 
some measures of broad areas of learning, 
heavily dependent upon experience (par- 
ticularly the achievement data). What we 
have said is that the specific measures relate 
to the broad measures without saying there 
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is unidirectional causation, and different 
types of research are needed to answer the 
causation question. Our study has not 
purported to answer the question of causa- 
tion, so Humphreys is constructing a straw 
man here. 


Median Splits and the Orthogonality of 
Simultaneous-Successive Processes 


It is well known that less powerful tests are 
a result of median splits, but this is more 
important in cases where no significant ef- 
fects are found by the median-split method. 
That is, if our correlations based on split 
groups were insignificant, but by using the 
full range of individual differences the cor- 
relations became significant, then we would 
have committed a Type II error, and Hum- 
phreys would be correct by pointing out how 
this had happened. However, as it stands, 
he has just shown a stronger effect than what 
we have obtained by using analysis of vari- 
ance. The effects of dichotomizing are 
harmful only when extreme tails are shown, 
in which case, dichotomizing will give spu- 
riously high correlations. There is no gen- 
eral rule for what to choose as a point for 
dichotomizing. So perhaps, if it is done at 
all, it should just be at the medians—which 
we havedone. In effect, we have tested our 
hypothesis very conservatively and found 
the effects to be significant. The method we 
have used might, at best, yield a Type II 
error, 

We object to Humphreys making his point 
by defining simultaneous and successive as 
simple composites of the various variables. 
In fact, as our factor analysis shows, we ob- 
tained three factors: simultaneous, suc- 
cessive, and speed. Then, we derived factor 
scores only for simultaneous and for suc- 
cessive. The three factors of simultaneous, 
successive, and speed have been shown to be 

orthogonal not only in the Kirby and Das 
(1977) article but also in many previous 
publications (see Das et al., 1975). However, 
we did not use equal Ns in our analysis of 
variance; instead, we let the Ns fill the cells 
to reflect the correlation, which according to 
Humphreys, is phi = .12. We would guess 
that the correlation is not significantly dif- 
ferent from zero, thus essentially reaffirming 


= 


J. P. DAS AND JOHN R. KIRBY 


the orthogonal relationship between simul. 
taneous and successive factors. Under these. 
circumstances, Humphreys’ discussion i 
regard to the influence of distortion in 
treating correlated variables as though they 
were orthogonal do not apply to our study, 
We had introduced only a minor distortion 
by changing the r between simultaneous a 


successive, which was zero, to a phi betwee 
simultaneous and successive, which was .12, 
As we mentioned before, that is a nonsig- 
nificant difference. 


Where Median Splits Have Been Useful 


Kirby and Das (1977) did not introduce 
median splits into the research literaturei 
psychology. In fact, single median spli 
have been done routinely by psychologists, 
We divide people on the basis of age, intel- 
ligence, and high and low verbal ability, 
Double median splits such as the one we 
used in our 1977 article can be easily found 
in personality research. Consider, for ex 
ample, Eysenck's (1957) division of indi: 
viduals into four groups on the basis of im, 
troversion and neuroticism. By using neat 
median splits on introversion and on nett 
roticism, he was able to categorize four 
groups of individuals as extraverted neu 
rotics, introverted neurotics, stable extra 
verts, and stable introverts. By extendin 
these individual differences on extraversio 
and neuroticism scales to pathological sam 
ples, he has demonstrated differences be 
tween what he calls the hysterics (extrai 
verted neurotics) and the dysthymics (intro 
verted neurotics). | 

To give one example of the usefulness of 
double median splits followed by analysis o 
variance design, McLaughlin and Eysenck) 
(1967) found that neurotic extraverts werti 
superior to stable extraverts on the eas 
paired-associate learning task, whereas th 
reverse was true on the more diffic 
paired-associate learning tasks. In 
Kirby and Das (1977) article, we did not find) 
any interaction that was significant. But) 
this does not undermine the general useful- 
ness of the double median split when one 1$ 

studying personality types or styles of c 
nitive processing. Even in the Kirby 806 
Das study, it was easier to see if the order 0f 
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4 achievement in reading was related to the 
type of information processing represented 
py the four groups. We concluded that 
those children who were high in both modes 
of processing score highest in reading tasks, 
and those who are low in both modes of 

rocessing score the lowest; whereas if the 
children are high in one and low in the other, 
their performance falls between the two ex- 
treme groups. These middle groups, how- 
ever, have been differentiated in other re- 
search. For example, in a doctoral disser- 
tation by McLeod (1978), children who were 
highly proficient in vocabulary but differed 
- in their comprehension were examined in 
terms of their simultaneous-successive 
processing. It was observed that those who 
were high in comprehension were also high 
in simultaneous processing irrespective of 
their status in successive processing. In 
other words, the high-simultaneous-low- 
successive as well as the high-simulta- 
neous-high-successive groups did better in 
comprehension than the remaining groups. 
Another use of the median split analysis is to 
be found in remedial work with children in 
an Aptitude X Treatment interaction de- 
sign. 

Thus, while it is useful to remember that 
variables on which we do median-split 
analyses should be orthogonal and to re- 
member the dangers of forcibly making cell 
frequencies equal by assuming orthogonal- 
ity, we should not forget the benefits of di- 
viding groups on the basis of median splits. 
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As far as the Kirby and Das (1977) article is 
concerned, Humphrey’s (1978) cautions and 
precautions do not weaken our findings or 
the theoretical context of simultaneous- 
successive processing in any manner. 
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[E a Naa d 
Advance Organizers _ ’ 
That Compensate for the Organization of Text 
Richard E. Mayer 


University of California, Santa Barbara 


In Experiment 1, subjects read a 24-frame text on computer programming 1 hat 
was presented in logical or random order. For random organization, subjects 
given an advance organizer performed better on a posttest than control 
subjects, but the opposite pattern obtained for logical organization. In Ex per- 
iment 2, subjects read a four-paragraph text concerning imaginary countries 
that was presented in name or attribute organization. Low ability subjects 
given an organizer prior to reading performed better on questions that re- 
quired integrating across different paragraphs of the presented text, and 


subjects given the organizer after readi 


tions concerning information they had read within the same paragraph. Ap- 
parently, advance organizers served as an assimilative context for unfamiliar 


organizations. 


"The present article investigates the role 
of advance organizers on ng from un- 
familiar text. According to Ausubel's (1968) 
subsumption theory and Mayer's (1975b) 
assimilation encoding theory, advance or- 

izers may be especially important for the 
ing of technical, unfamiliar, or poorly 
P ree material because they serve the 
fi functions: (a) Availability—a 
context is provided to which new 
may be assimilated. For example, 

Ausubel (1968, p. 148) has argued 
meani requires having relevant 
“ideas already available in cognitive struc- 
ture,” and that for advance Organizers to 
Provide these “anchoring ideas or subsum- 
ers advance organizer must be “pre- 
sented at a higher level of abstraction, gen- 
Dex or inclusiveness." (b) Activation — 
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ing performed relatively better on ques- 


the role of an organizer to “delineate clear] 
precisely, and explicitly the principal simi 
larities and differences between the ney 


gerald, 1961) have suggested using an “ex 
pository organizer” when no anchoring ideas 
are available to the learner, and using & 
“comparative organizer" when anchoriny 
ideas are available. 
"There is some lack of agreement com 
cerning the empirical support for the eff 
of advance organizers. Although Ausub 
original studies conducted during the 1960 
(see Ausubel, 1977; Mayer, 1977b) produc 
significant effects, the differences between 
advance organizer and control groups we 
often quite small. In reviews of more rece 
work, there are many reported cases of oF 
Eanizers affecting performance and some 
cases where they do not (Ausubel, 19 
Barnes & Clawson, 1975; Lawton & Wanska, 
1977; Mayer, 1977b; West & Fensham, 1976). 
As West and Fensham (1976) have pointe 
out, the best way to resolve the apparently’ 
conflicting results is to test the predictions 
of specific theories concerning organizer 
For example, Ausubel’s subsumption theo! 
predicts that organizers should not a 
learning under certain circumstances. — 
This article attempts to extend earlier 
work by providing three cognitive th 


domain 


advance organizers, and an experimental 
iest. The assimilation encoding theory 
tes that advance organizers provide a 
‘meaningful context (or anchoring ideas) and 
courage learners to integrate new infor- 
mation within this context. Since learning 
involves an integrative assimilation process, 
js theory predicts that subjects should 
quire a broader learning outcome capable 
supporting transfer. Also, since technical 
letails may be lost in the assimilation pro- 
‘ess, this theory predicts that retention of 
specific details may be hindered. The ad- 
ition theory states that the advance orga- 
nizer group might perform better overall on 
ill types of questions due to having more 
"anchors" for hooking up incoming ideas. 
Finally, the reception theory states that 
learning depends only on whether the in- 
formation was presented and received by the 
learner; since the test is based solely on in- 
formation presented in the text, advance 
wganizers should have no effect. A recent 
series of studies designed to test these the- 
ores (Mayer, 1975a, 1976a, 1976b, 1977a) 
tlearly supports the predictions of the as- 
similation theory when material is technical 
ind unfamiliar to subjects. For example, 
fiving an advance organizer prior to learning 
Anew computer programming language re- 
sulted in superior far-transfer performance 
but slightly poorer near-transfer perfor- 
mance, 
The present article investigates a second 
major prediction of these theories concerning 
effects of advance organizers on logically 
tnd poorly organized text. The assimilation 
theory predicts that posttest performance 
should be improved by advance organizers 
when the material is randomly (or poorly) 
‘ganized but not when it is logical; when the 
Material is logically organized, subjects may 
able to integrate the material on their 
Wn, but when the material is not presented 
the optimal organization, a meaningful 
arning set can serve as a context for inte- 
rating and holding together the incoming 
material. According to the addition theory, 
posttest performance should increase for 
logical and random texts if advance 
einizers are given. According to the re- 
En theory, advance organizers should 
influence performance for either type of 
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presentation if the test does not directly in- 
volve the advance organizer material. 
Since the literature on text organization 
has been reviewed elsewhere (Mayer, 1977b), 
it can be summarized here by stating that the 
results are contradictory. One important 
study that sheds some light on the conflict 
was Tobias’s (1973) study, replicated by 
Dyer and Kulhavy (1974), in which signifi- 
cant scrambling effects were obtained for a 
technical version of the text but not for a 
familiar version, These results encourage 
the idea that poor organization can be com- 
pensated for by making sure the reader has 
a meaningful context for integrating the in- 
coming material, as would be expected for a 
familiar text. Another way to provide such 
a context when the material is technical is to 
use a familiar advance organizer, as will be 
investigated in the present studies. 


Experiment 1 


Method 


Subjects and design. The subjects were 56 college 
students recruited from the psychology subject pool at 
the University of California, Santa Barbara, The 
subjects had no prior experience with computers or 
computer programming. The design was a 2 X 2 X 2 
factorial with the factors being organization of text 
(logical vs. random), advance organizer (before vs. 
none), and mathematical ability (high vs. low), Seven 
subjects served in each cell, with all subjects contrib- 
uting measures for six within-subjects tests. 

Materials. The materials included a 24-frame se- 
quence for basic computer programming, with each 
frame consisting of 100-200 words typed onto a 4 X 6 in. 
index card (modified from Mayer, 1976b). Two sets of 
24 frames were constructed: The heading set contained 
2-8-word underlined headings at the top on each frame, 
and the no-heading set did not, In addition, a 500- 
word advance organizer which described a computer in 
familiar terms was typed onto a sheet of paper (modified 
from Mayer, 1975a), and a heading list containing a 
listing of the titles of the 24 frames was typed onto an- 
other sheet of paper. The organizer compared a com- 
puter to familiar on such as a ticket window, score- 

rd, notepad, and so on. 
ne 18-item test was constructed with individual 
questions typed on 3 X 5 in. index cards. The questions 
were modified from Mayer (1975a, 1976b) and consisted 
of three items for each cell of a 2 X 3 factorial test design. 
The factors were type of question (e.g., whether the 
question asked the subject to generate a program, or to 
interpret what a given program would do) and length 
of question (e.g., whether the question dealt with al- 
line program, a 4-8-line program that did not involve 
looping, or a 4-8-line program requiring looping). 
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Additional materials included an answer sheet for the 
test and a subject pretest, which asked the subject to 
give his or her Scholastic Aptitude Test Mathematics 
(SAT-M) score and to solve six algebra substitution 
problems. 
Procedure. Subjects participated in the 12-hour 
experiment in groups of from 2 to 4, with subjects ran- 
domly assigned to treatments. First, subjects com- 
pleted the pretest; subjects who reported scoring above 
550 on the SAT-M were classified as high ability, and 
those who reported scoring below were classified as low 
ability. 
ndn cries for the reading task were read to the 
subjects. Prior to reading the 24 frames, the before 
group was given the advance organizer sheet and 
heading list to read at their own rates. ‘The none group 
received neither. The materials were then collected, 
and the 24-frame instructional decks were given to each 
subject. Subjects in the logical group received the cards 
in their natural sequence, while subjects in the random 
group received them in individual orders determined 
by random number tables (see Mayer, 1976b, p. 148). 
Subjects read at their own rates, but they could read 
only one card at a time, and they could not go back to 
read previous cards. 

Following reading the 24-frame deck, subjects were 
given instructions for the test and an 18-card test deck. 
The order of test items was random except that the 
three questions of each kind occurred together. 
Subjects solved at their own rates but could work on 
only one card at a time and could not go back to work 
on previous cards. 


Results and Discussion 


Three subjects failed to complete the ex- 
periment, and three subjects expressed fa- 
miliarity with computer programming, so 
new subjects were recruited in their places. 
Unlike previous experiments (Mayer, 1975a, 
1976b) subjects who reported low SAT-M 
scores and who failed to pass the pretest were 
retained (in the low ability group). Answers 
to the test were scored as correct or incorrect 
and were analyzed by a 5-factor analysis of 
pacem T MER ain X Advance 

rganizer X Ability X Type of Questi 
Length of Question). ui oes 

As expected the high ability subjects 
scored significantly higher than the low 
ability subjects, with scores of 4896 and 2896 
correct, respectively, F(1, 48) — 18.29, p « 
001. There were no differences in overall 
performance between subjects who were 
given the advance organizer and subjects not. 
given the organizer—3896 correct for each 
group, F(1,48) « 1. In addition, logical or- 
ganization produced overall scores that were 
indistinguishable from the random organi- 
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Table 1 
Proportion Correct Response for Before and 
None Groups Based on Logical and Random 


Texts 


Advance Organization 

organizer Logical Random 
Before -36 AL 
None EL 31 


zation—40% versus 36% correct, respec 
tively, F(1, 48) < 1. 

The main focus of the present study w 
to determine whether advance organizers 
serve to counteract the effects of poor text 
organization. For example, if advance or 
ganizers serve as assimilative contexts, then 
one would predict that there should be ni 
positive effect of advance organizers for 
logical organization but that advance orga 
nizers should aid for random organization 
This prediction was upheld in a significan 
interaction between text organization anl 
advance organizer, F(1, 48) = 4.12, p < 4 
"Table 1 summarizes this pattern in which 
before group performed better than the no 
group on random organization but worst 
when the text was presented in logical of 
ganization. 

In addition, the only other significant d 
fect was an interaction between advan 
organizer and type of question, in which 
before group performed better than the n 
group (3896 correct vs. 2996 correct) on f 
transfer questions involving interpretatio 
while the none group outperformed the} 
fore group on near-transfer questions ! 
volving generation (46% vs. 39% correc 
This interaction, F(1, 48) = 4.81, p €^ 
replicates the results of previous studi 
(Mayer, 1975a, 1976b). As in those studi 
a reasonable conclusion is that the advan 
organizer allowed subjects to assimilate” 
material and form a broade- learning 0% 
come. 


Experiment 2 | 


Experiment 2 was intended to extend i 


results of Experiment 1 by using two di 
ent text organizations: by attribute or 
name of country. Test questions cou’, 
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wwered based on one of the text frames 
e questions) or required information 
several different frames (different 
tions). For example, an attribute 
estion dealt with just one attribute (but 
y names), SO it required information 
m one frame of an attribute-organized 
and many frames of a name-organized 

_ Similarly, a name question dealt with 
one name (but many attributes), so it 
quired information from one frame of a 
ame-organized text and many frames of an 
itribute-organized text. If no organizer is 
iven, there should be a pattern in which 
ibjects perform better on same questions 
lan different questions. However, if an 
vance organizer can serve as an assimila- 
We context, then this pattern should be 
liminated. 


fethod 


Subjects and design, The subjects were 96 college 
lidents recruited from the psychology subject pool at 
University of California, Santa Barbara. None of 
ihe subjects had served in the previous experiment. 
Welve subjects served in each cell of a2 X 2X 2 facto- 
design. The factors were organization of text (name 
attribute), sequencing of advance organizer (before 
after), and ability of subjects (high vs. low). All 
ubjects received the same four tests, so comparisons 
iolving test type are within-subjects comparisons. 
Materials. Materials consisted of two versions of the 
t, an advance organizer, four tests, and a subject 
itestionnaire. 
The text consisted of 16 sentences concerning 4 at- 
butes (economy, politics, climate, geography) of 4 
aginary countries (Brontus, Atweena, Galbion, 
lirmania), with four sentences typed on four 3 X 5 in. 
dex cards. Some of the country names and attributes 
poen from materials used by Schultz and Di Vesta 
3 2). For the name-organization text, each card 
intained the 4 sentences describing the attributes of 
Pingle country, and the card was headed with the name 
E country. For the attribute-organization text, 
s DM contained 4 sentences describing the same 
tribute for all 4 countries, and the card was headed 
e name of the attribute. 
E example of the information on a name-organiza- 
card is the following: “Facts about Galbion. 
he E. is land-locked and has no outlet to the sea. 
goes in Galbion are generally mild. 
D. y, a military dictatorship is in charge of Gal- 
m In Galbion, the people work mainly in tourist 


an example of the information on an attribute-or- 
Ber n card is the following: “Facts about 
à e ies. Galbion is landlocked and has no outlet 
5 mn isolated island is the location of Nur- 
ere are many splendid lakes in Brontus. 


EE 
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There are many beautiful mountains all across Atwee- 
na. 

The advance organizer consisted of an 81 X 11 in. 
sheet of paper that was divided into a 4 X 4 matrix of 
squares; the squares were empty, but the rows were la- 
beled with the attribute names (economy, politics, ge- 
ography, climate), and the columns were labeled with 
the country names (Brontus, Atweena, Galbion, Nur- 
mania). 

The four tests were typed onto $'/ X 51 in. sheets of 
paper. The recall-name test asked, “In the space below 
write down all you can remember about Galbion.” The 
recall-attribute test asked, "In the space below write 
down all you can remember about the geography of each 
country.” The inference-name test consisted of 12 
fill-in questions that involved thinking about several 
attributes of just one country; for example, “What is the 
geography of the country with mild temperatures? 

7" The inference-attribute test consisted of 
12 fill-in questions that involved thinking about one 
single attribute across several different countries; for 
example, “The lakes of Brontus are comparable to the 
of Galbion.” 

The subject questionnaire solicited information 
concerning the subjects’ age, sex, mathematics experi- 
ence, SAT scores, and related information. In addition, 
three stop watches were used to record individual 
reading and solution times. Three partitioned booths 
were also used; each had partitions on three sides to 
prevent eye contact among subjects, and the partition 
farthest from the subject had a 12 X 6 in. window 
through which cards could be passed to the experi- 
menter. 

Procedure. Subjects were run in small groups of 2 
or 3 per session and were randomly assigned to treat- 
ments. Subjects were seated in separate booths and 
could not see one another. 

First, instructions for the reading task were read. 
Subjects were told to assume that they were diplomats 
and that they had to learn some new information. 
Subjects were told to read the first card, then slide it out 
the window when they had learned the information on 
it; then the next card was given, and so on. Thus 
subjects saw only one card at a time and could not go 
back to previous cards. Each subject worked at his or 
her own rate, and the experimenter recorded the total 
reading time for all four cards. The order of the four 
cards was randomized, except that subjects in the 
name-organization group received the four name cards 
and subjects in the attribute-organization group re- 
ceived the four attribute cards. ^ 

In addition, subjects in the before group were given 
the advance organizer just prior to reading, but after the 
instructions. They viewed the advance organizer for 
60 sec with the instructions, “Some subjects have found 
that this system makes your task easier; you may study 
it for 1 minute and then [will take it away.” The after 

p was given the same advance organizer and in- 
structions after reading and just prior to the test. 

When a subject finished reading, instructions for the 
test were given. Subjects were to work at their own 
rates and try to get as much correct as possible. The 
first test was given, and when the subject was finished 
the subject slid it out the window; then the next test was 
given, and so on. Thus the subject worked on one test 
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at a time and could not go back to work on previous 
tests, The order of the tests was always recall-name, 
recall-attribute, inference-name, inference-attribute. 
"The experimenter recorded the time spent on each 


test. 
When the subject finished all four tests, the subject 


questionnaire was given. This was done to reduce the 
chances that some subjects would be leaving the room 
while others were still working on the test. 


Results and Discussion 


An analysis of variance was performed on 
the total reading times using the three be- 
tween-subjects factors of organization of 
text, position of advance organizer, and 
ability of subjects. For purposes of this and 
all other analyses, subjects with SAT-M 
scores of 550 or below were counted as low 
ability, and subjects with scores above 550 
were counted as high. The attribute-orga- 

nization groups required much more reading 
time than the name-organization groups, 
with average reading times of 488 and 283 
sec, respectively, F(1, 88) = 43.01, p < .001. 
Apparently, in the present situation, the 
name organization was more natural and 
consistent with the subjects’ normal way of 
organizing information. This conclusion is 
similar to that of Schultz and Di Vesta (1972) 
based on the finding that more is recalled for 
name organization than attribute organiza- 
tion of characteristics of countries, and 
clustering in free recall tends to be by name 
for randomly presented information (see 
Mayer, 1977b, for a review). In addition, 
subjects who had seen the advance organizer 
prior to learning required less reading time 
(for example, the advance organizer saved 60 
sec for the attribute-organization subjects); 
however, differences involving this factor 
failed to reach statistical significance. 

A second analysis of variance was per- 
formed on the test performance of subjects. 
The data consisted of the proportion cor- 
rectly recalled on each of the two recall tests 
(out of 4 possible in each) and the proportion 
of correct answers on each of the two infer- 
ence tests (out of 12 possible on each). In 
Scoring, misspellings of country names and 
Synonyms for attributes were allowed. The 
between-subjects factors were the same as 
above, and the within-subjects factors were 
type of test (recall vs. inference) and orga- 
nization of test (name vs. attribute). Per- 
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formance for the attribute-organization te 
was significantly better than for subjects wh 
read the name-organization text, F(1, 88) = 
6.39, p < .025. Thus, although the attrib. 
ute-organization text was much more time: 
consuming to read, the additional activify 
and effort required seems to have paid off in 
higher test scores. As expected, high abilif 
subjects performed better on the test tha 
low ability subjects, F(1, 88) = 10.85, p <0] 
(see marginals of Table 2). 
A major question addressed by this ey 
periment was whether advance organizers 
might serve to counteract the effects of 
complex text organization. For example, 
the advance organizer serves as an organizi 
or integrating context for the material inf 
text, one prediction is that it should not hay 
a facilitative effect in situations where th 
subject already has a good means of m 
membering the information but that 1 
should have a facilitative effect in situation 
where the presented information is orgi 
nized differently from the test. In th 
present experiment, this pattern would] 
indicated by an interaction among text 0 
ganization, advance organizer, and test of 
ganization. Before and after subjects shoul 
perform at similar levels for questions thi 
are organized in the same manner as in th 
text (name-organization subjects on na 
questions, and  attribute-organizati 
subjects on attribute questions); howei 
the before group should perform better tht 
the after group on questions that are ind 
ferent form from the text organizatii 
(name-organization subjects on attribU 
questions, and attribute-organizatil 
subjects on name questions). As shov n 
Table 2, this interaction reflects the perf 
mance of low ability subjects but not hi 
ability subjects and is consistent with W 
idea that high ability subjects have their o 
ways of integrating presented informi 
tion—interaction of Text Organization 
Advance Organizer X Ability x Test Org 
nization, F(1, 88) = 4.19, p < .05. 1 
In order to more closely analyze this 
teraction, difference scores were constru Ve 
for each subject by using the following fo 
mula: difference score = (proportion corte 
on name questions for name-organizate 
test subjects plus proportion correct ont 
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Proportion Correct Response by Advance Organizer and Ability Groups for Same and 


Different Questions 


à Average 
Same questions Different question difference 

Group NT & NQ AT & AQ NT&AQ AT & NQ score 
Low ability 

Before .62 -14 -76 NL =.14 

After 60 81 59 .69 *43 
High ability 

Before .80 .96 A3 .88 +.15 

After .80 .87 18 .83 +.06 


ote. NT = name text; NQ = name question; AT = attribute text; AQ = attribute question. 


tibute questions for attribute-organization 
bjects) minus (proportion correct on at- 
ibute questions for name-organization 
bjects plus proportion correct on name 
questions for — attribute-organization 
ubjects). For the low ability subjects, the 
efore group received a difference score of 
.14 compared to +.13 for the after group, 
(46) = 2.07, p < .05; the scores of before 
sus after groups for high ability subjects 
t.15 and +.06, respectively) were not sig- 
ficantly different from one another (t < 1). 
hese results support the earlier prediction 
àt advance organizers should have their 
lrongest. positive effect on tasks that are 
lifferent from the original organization and 
ôr subjects who might not otherwise use 
Megrating contexts to encode the material. 
e finding that organizers aided low ability 
hjects to overcome poor text organization 
onsistent with previous results in which 
panizers ténded mainly to aid low ability 
amers (Mayer, 1976a; West & Fensham, 
1 Apparently, high ability learners 
len have developed a useful encoding 
Mategy on their own. 
Tevious studies comparing name and 
tribute organization for passages indicate 
lat subjects’ ability to integrate information 
influenced by presentation drganization 
4 Mayer, 1977b, for a review). For ex- 
ple, Frase (1973) found that subjects 
En attribute organization for passages 
3 ut the characteristics of four ships per- 
red better on questions involving one 
1 pte (such as, “What color was the mast 
j € Squid that was red on the Shark?"), 
3 name organization subjects performed 


better on questions involving one name (such 
as, "What color was the mainmast of the ship 
that had a red jogger?"). The present re- 
sults indicate that this pattern can be re- 
duced by the use of advance organizers and 
thus suggest that subjects are able to encode 
the information in a more integrated form 
than simply copying the presentation orga- 
nization. 


General Discussion 


In both studies, the results most closely 
supported the predictions of the assimilation 
theory; advance organizer subjects per- 
formed relatively better on questions that 
require integrating facts from across differ- 
ent sections of the text (i.e., different ques- 
tions), while control subjects performed 
better on using facts that had occurred near 
one another in the text (i.e., same questions). 
In Experiment 1, the comparison between 
same and different was a between-subjects 
comparison, since both groups received the 
identical questions, but for the logical 
subjects they were sames, and for the ran- 
dom subjects they were differents. In Ex- 
periment 2, the comparison was within- 
subjects, since all subjects received questions 
based on the presentation organization (such 
as name questions for the name group and 
attribute questions for the attribution group) 
and questions based on a different organi- 
zation (such as attribute questions for the 
name group and name questions for the at- 
tribute group). 

"These results are consistent with two re- 
cent studies dealing with related issues. 
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Lesh (1976) found that an organizer was 
important in improving performance for a 
hierarchically organized unit on finite ge- 
ometry but was not useful when the same 
information was organized as a spiral in 
which ideas were repeated and more inte- 
grated with one another. In another study, 
Schumacher, Liebert, and Fass (1975) found 
that an organizer increased performance for 
a passage concerning U.S. presidents when 
the passage was presented in six separate 
paragraphs but was not useful for an inte- 
grated passage that contained ‘transition 
phrases built into the text. One conclusion 
is that advance organizers are not needed in 

texts that provide transitions and links be- 

tween sections. 
Apparently, the advance organizers used 
in these studies served as integrating con- 


texts to which new, incoming information - 


may be assimilated. The organizers served 
to free the subject from the need to exactly 
encode items in an awkard or unfamiliar 
presentation order. When information is 
presented in a logical manner and the test 
questions reflect the presentation organi- 
zation, an advance organizer has no positive 
effect; however, when the material is pre- 
sented in an order that is inconsistent with 
the posttest question, then advance orga- 
nizers seem to have a facilitative effect. 
This effect provides an independent line of 
support for the assimilation theory, which 
states that the organizer provides as assim- 
ilative context for organizing the incoming 
information. 
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performance for all children. 


A currently popular approach to the edu- 
lion of learning disabled children involves 
he use of diagnostic procedures that are 
pposed to lead directly to the formulation 
ndividualized learning or remedial pro- 
ms. This technique, sometimes called 
sychoeducational diagnosis,” or the 35 
gnostic_prescriptive” approach, attemp 
formulate an educational program that is 
sistent with each child’s unique learning 
le or set of capabilities. Learning style is 
described in terms of a child’s pattern 
strengths and weaknesses in various 
sychological processes” thought to be es- 
itial to learning. . Once the child's profile 
abilities has been identified, the educator 
n construct a remedial program designed 
0 capitalize on areas of strength and to 
pport the development of learning skills 
Nat are weak. 
An important key to the successful ap- 
lication of psychoeducational techniques 
in obtaining an accurate understanding 
the psychological processes that are defi- 
nt in the child (Senf, 1973). In fact, the 
dea that an educational program may be 
lored to the specific learning capacities of 
E 
Requests for reprints should be sent to Joseph Tor- 
en, Department of Psychology, Florida State Uni- 
Sity, Tallahassee, Florida 32306. 
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Memorization Processes Involved in Performance on the 
'  Visual-Sequential Memory Subtest of the 


Illinois Test of Psycholinguistic Abilities 


Charles Bowen, Tony Gelabert, and Joseph Torgesen 
Florida State University 


Two studies were conducted in order to investigate memory processes in- 
volved in performance on the Visual-Sequential Memory subtest of the Illi- 
nois Test of Psycholinguistic Abilities. In the first study, 64 second and 
fourth graders were administered this subtest under standard conditions. 
The children were designated as either “labelers” or “nonlabelers” according 
to whether or not they used a stimulus labeling strategy to aid recall. Differ- 
ences in performance between second and fourth graders could be accounted 
for almost entirely in terms of the greater number of labelers in the fourth 
grade. In the second study, labelers and nonlabelers from each grade were 
formally trained to use a labeling strategy. The training resulted in improved 


a child presupposes that one knows what 
these capacities and skills are. This intro- 
duces the problem of assessment and diag- 
nosis. In order for the psychoeducational 
approach to work properly, it is essential that 
reliable and valid measures be used to obtain 
information about the child’s learning 
problems. 

Unfortunately, there have been very few 
empirical investigations designed to discover 
which psychological processes are actually 
involved in performance on many of the di- 
agnostic instruments in use today. Most of 
these tests have been examined in studies of 
predictive or concurrent validity, but none 
has received the kind of extensive experi- 
mental investigation required to identify the 
actual psychological processes that they 
measure (Estes, 1974). Much of the criti- 
cism of psychoeducational approaches to the 
education of learning disabled children is 
valid because there is little evidence that 
many diagnostic instruments measure the 
processes they are supposed to (Mann, 1971). 
"Thus, it is very important that investigators 
interested in the diagnosis of various types 
of learning disorders begin to examine the 
range of psychological processes that are 
being measured by diagnostic tests. 

A description of performance in terms of 
psychological processes involves a specifi- 
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cation of a sequence of events or behaviors 
involved in the accomplishment of a task. 
Recent research in developmental psychol- 
ogy has begun to study a variety of cognitive 
abilities or attainments in terms of under- 
lying processes (Flavell, 1977). This ap- 
proach is exemplified particularly in the 
study of memory development. One of the 
basic findings of research in this area is that 
most of the improvement in memory per- 
formance with age is the result of changes in 
the ways children approach memory tasks 
rather than an increasing memory capacity 
per se. That is, older children approach 
memory problems in a more planful and 
strategic manner than younger children, and 
this more efficient use of mnemonic strate- 
gies results in better performance (Brown, 
1975; Hagen, Jongeward, & Kail, 1975). 
Many differences in memory performance 
between normal children and some groups 
of exceptional children such as the mentally 
retarded or reading disabled also appear to 
be related to deficiencies in the use of mne- 
monic strategies (Brown, 1974; Torgesen, 
1977). 

Developmental research suggests that 
memory is a multifaceted skill and that a 
variety of different processes are involved in 
performance on memory tasks. Descrip- 
tions of memory performance that take ac- 
count only of the amount that is remem- 
bered on any given task are inadequate be- 
cause they do not Suggest how the perfor- 
mance may be improved (Glanzer, 1967). In 
contrast, research on memory development 
in exceptional children has repeatedly 
demonstrated that a knowledge of the 
memorization processes underlying good 
performance on memory tasks leads directly 
to training programs that can increase per- 
Tn dramatically (Campione & Brown, 

The present study is designed to investi- 
gate memory processes involved in perfor- 
mance on the Visual-Sequential Memory 
subtest of the Illinois Test of Psycho- 
linguistic Abilities (ITPA; Kirk, McCarthy, 
& Kirk, 1968). This particular subtest was 
chosen because it isa part of a very popular 
diagnostic instrument and also because the 
cognitive activities involved in performance 
on tasks like this have been well studied in 
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developmentalresearch. On this task, t 
child is shown a picture of a sequence 
"nonmeaningful" figures. The sequence q 
figures is viewed for 5 sec, after which th 
child must reconstruct the sequence from; 
memory using a set of plastic chips with 
corresponding figures on them. 

The Visual-Sequential Memory test was 
designed to assess “short-term memory for 
visual sequences” (Kirk & Kirk, 1971, p. 
115). Difficulties on the task are thought to 
be related to visualization problems or dif- 
ficulties in forming or retaining a visual 
image of nonmeaningful material. However 
research on memory development suggests 
that other processes, relating to the use of 
certain mnemonic strategies, may also con- 
tribute to individual differences on this task, 
One very common strategy that is used by 
children in the early and middle elemen- 
tary-school grades is the application of ver- 
bal labels to visual stimuli (Hagen & Stano- 
vitch, 1977). Although Kirk and Kirk (1971) 
acknowledge that the Visual-Sequenti 
Memory test may sometimes be “subverted 
into a test of auditory memory through the 
use of labels, they also note that the figures 
used in the sequences were constructed to 
aire. this possibility. "This research 

mpted to investigate the importance of 
stimulus labeling to performance on the 
Visual-Sequential Memory test. It is hy- 
pothesized that development in the use of 
this strategy accounts for most of the age- 
related improvement on the task at certain 
age levels. 


Experiment 1 


There are several different ways to in- 
vestigate the cognitive activities involved in 
performance on a task (Belmont & Butter- 
field, 1977). These activities may, for ex- 
ample, be inferred from patterns of perfor- 
mance as the task is altered systematically; 
or they may be observed directly in the form 
of overt behaviors. Another approach is to 
interview subjects after performance on the 
task in order to determine if any special 
procedures were used. Experiment 1 em- 
ployed the latter technique as an initial at- 
tempt to verify the extent to which children 
consciously applied verbal labels to the 


yimuli of the Visual-Sedgential Memory 


Subjects. Thirty-two second-grade students and 32 
rth-grade students from the Developmental Re- 
arch School of Florida State University participated 
inthe study. Mean chronological age of the second- 
rade sample was 7 years 10 months and for the fourth 
graders was 9 years 10 months. Sixteen children of each 
Nox were tested at each grade level. 

Procedure. The visual-sequential memory task was 
administered in a standard fashion (see Kirk et al., 1968) 
peach of the 64 subjects. All testing was conducted 
inamobile research laboratory. At the completion of 
ihe test, subjects were questioned in order to determine 
they had labeled the stimuli as a mnemonic aide. 
ey were asked if they had done “anything special” to 
elp them remember the order of the stimuli. Fol- 
lowing this inquiry, which was conducted in an open- 
nded fashion with nondirective probes sometimes 
eing used, each child who explained the use of a la- 
beling strategy was asked to tell the labels that had been 
sed. Only those children who gave an unambiguous 
description of the use of labels as a strategy and who 
Ivere able to label two of the stimuli were subsequently 
lassified as “labelers” in the analysis of results. 


esults and Discussion 


Two aspects of the data are of particular 
nterest: the number of children in each 
grade level classified as labelers and the re- 


to these questions are summarized in Table 
With recall scores given in raw score form. 
first, more fourth than second graders re- 
orted the use of labeling as a mnemonic 
lrategy, x2 = 6.06, p < .02. Almost half of 
he fourth graders reported the use of this 
E while only 1696 of the second grad- 
es did. 

It is also apparent that children, whether 
In second or fourth grade, who reported the 
se of stimulus labeling attained higher re- 
all scores than those who did not. This 


(Overall & Spiegel, 1969). This particular 
Procedure was used because of the unequal 
number of subjects in each cell of the anal- 
¥sis of variance, which led to nonorthogon- 
ality of the main effects of grade and labeling 
Categories. In this analysis, both grade, F(1, 
0) = 10.6 p « .01, and labeling, F(1, 60) = 
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Table 1 
Recall Scores of Labelers and Nonlabelers in 
Second and Fourth Grades 


Grade Labelers Nonlabelers 
level n M SD n M SD 

Second 5 296 69 27 211 = 53.7 

Fourth 14 30.7 5.4 18 23.2 3.7 


36.3, p < .001, effects were significant. 
Thus, fourth graders recalled better than 
second graders, and those who labeled re- 
called better than those who did not. Fur- 
ther analysis suggests, however, that the 
grade effect was due entirely to the fact that 
there were more labelers in the fourth than 
second grade. Post hoc comparisons re- 
vealed no significant differences in recall 
scores between children of different grade 
levels who belonged to the same labeling 
category. Thus, all of the age-related im- 
provement in recall between second and 
fourth graders may be accounted for in terms 
of the greater likelihood of fourth graders to 
use stimulus labeling as a strategy on this 
task. 

Although these data are suggestive, they 
are not conclusive. They indicate that a 
fairly high proportion of children in the 
fourth grade applied a conscious labeling 
strategy to the Visual-Sequential Memory 
test, and they also demonstrate a positive 
relationship between reported use of the 
strategy and the recall score. However, be- 
cause they are correlational, the data do not 
establish that labeling, in fact, was the major 
cause of the higher recall scores. In order to 
provide additional clarification of the rela- 
tionship between stimulus labeling and 
performance on the Visual-Sequential 
Memory test at this age, à second experiment 
was conducted. 


Experiment 2 


If the use of stimulus labeling was a major 
factor underlying differences in performance 
between those who reported labeling and 
those who did not, then training and support 
in the use of this strategy should have a 
strong effect on the performance of children 
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who did not label spontaneously. If training 
is given to both labelers and nonlabelers, 
differences in recall between groups should 
be reduced. At the very least, such training 
should raise the performance of the nonla- 
belers to a level equal to that of the labelers 
in the first experiment. Experiment 2 em- 
ployed an instructional approach (Belmont 
& Butterfield, 1977) in an attempt to induce 
all children to use the same processing 
strategy on the Visual-Sequential Memory 
test. 


Method 


Subjects, Twenty-three second graders and 23 
fourth graders from Experiment 1 were randomly se- 
lected for training and retest. The only constraint on 
selection was that all subjects who were found to label 
spontaneously in Experiment 1 were included in this 
experimental group. An additional 7 second graders 
and 8 fourth graders remaining from the original sample 
served as control subjects. 

Procedure. All experimental subjects were trained 
and retested within 2 months of the first visual-se- 
quential memory testing. Training sessions were de- 
signed to teach children the use of labels as a mnemonic 
strategy. The subject was first provided with prede- 

termined names for each of the 11 visual stimuli. Only 
names that had relatively high association value with 
each stimulus were selected. None of the children ex- 
Legs: rbi d learning m names, and this part 
ning was accomplished in 3 to 5 minutes. 
Following the learn i 
received three "labeling practice" trials on the Visual- 


stimuli, labeling was emphasized and encouraged by the 


visual-sequential 
memory task was given immediately after the practice 


ta Arkan 
bid standard administration proce- 


The control subjects were also given additional ex- 
perience in manipulation of the tiles preceding the 


was never suggested. These 


squares depicted on 10 X 15 cm cards using the vi ual- 
sequentia] memory stimulus tiles as “building block” 


Results and Discussion 


Comparisons of recall performance be- 
tween the first and second sessions for both 
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SECOND GRADE FOURTH GRADE 
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Figure 1. Recall scores of experimental and conti 


subjects on the first and second administration of the. 
Visual-Sequential Memory test. . (Labelers are desig- 
nated by an unbroken line with black end points, non- 
labelers by a broken line with black end points, and 
controls by an unbroken line with white end points) 


experimental and control groups are pro- 
vided in Figure 1. It is clear from these 
graphs that the labeling instructions hada 
powerful effect on performance at both 
grade levels. Repeated measures analyses 
of variance indicated a significant effect 0 
instruction for both the second, F(1, 21) 

21.2, p < .01, and fourth, F(1, 21) = 23.9, p 
< .01, grades. However, for neither grad 
was the Group (labelers vs. nonlabelers) 

Sessions interaction significant. In othe 
words, the instructions had similar effects o 
the performance of both labelers and non- 
labelers. Although the graphs suggest tha 
the difference in performance between 
groups was narrowed by instructions in la- 
beling, the differences were not reduc 
sufficiently for the effect to attain statisti 
reliability. However, the instructions i 
labeling did improve the performance of 
both second- and fourth-grade nonlabelers 
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Figure 2. Recall scores of fourth-grade control 
subjects who did not label on the first administration 
of the Visual-Sequential Memory test. (Subjects who 
labeled on the second administration are designated by 
an unbroken line; subjects who did not label on the 
ae administration are designated by a broken 
ine. 


second session was slightly lower than for the 
first. A different picture emerges, however, 
when data from the fourth graders are con- 
sidered. In this case, the performance gains 
of the control group paralleled those of the 
group that was trained to use a labeling 
Strategy. In order to help understand the 
reasons for this apparently large practice 
effect with fourth graders, interview data 
Were used to examine possible changes in 
lask strategy between sessions. According 
to the criteria by which children were origi- 
nally classified as labelers, one half of the 
fourth graders in the control group used a 
labeling strategy spontaneously the second 
time they were tested. In Figure 2, the 
‘cores of children who spontaneously labeled 
during the second session are presented 
'eparately from those who did not. It is 
ipparent from this graph that almost all of 
the improvement in scores between sessions 
for the fourth-grade control group may be 
'ccounted for by the four children who 
idopted a different task strategy the second 
ime they were given the test. Although 
hese data were not formally analyzed be- 
use of the small n, they provide additional 
‘upport for the hypotheses that the use of 


abeling is an important strategy in ac- 
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counting for individual differences on the 
visual-sequential memory task. 


General Discussion 


Both studies reported here support the 
conclusion that performance on the Vi- 
sual-Sequential Memory test may be greatly 
facilitated by the use of verbal labels as a 
mnemonic strategy. The data from Exper- 
iment 1 support the more specific conclusion 
that most of the age-related improvement in 
performance on this task between the ages 
of 7 and 9 years is due to the greater likeli- 
hood of older subjects to employ a labeling 
strategy on this task. That is, skill in using 
verbal labels as a way of enhancing memory 
for nonmeaningful visual stimuli appears to 
be the major developmental achievement 
underlying the better memory performance 
of fourth-grade children on this task. It may 
be true, given the small number of second 
graders who were classified as labelers, that 
improvement in performance on this task at 
younger ages depends on the development . 
of other processes, such as the ability to focus 
attention or to utilize visual imagery. 
However, the fact that there were no differ- 
ences in performance between second- and 
fourth-grade children who used the same 
labeling strategy indicates that these other 
processes contribute little to improvements 
in performance on the Visual-Sequential 
Memory test after age 7 years. 

Although Experiment 2 demonstrated a 
strong effect of instructions in the use of 
verbal labeling on recall performance, the 
predicted interaction between groups and 
treatment was not obtained. Although such 
an interaction would have offered a conclu- 
sive demonstration of the role of verbal la- 
beling in accounting for individual differ- 
ences on the task, a reexamination of our 
training methods suggests several possible 
reasons why it did not occur in this case. 
First, the children were provided with 
stimulus labels for all of the stimuli. This 
served to increase performance of children 
who had labeled spontaneously because none 
of them had actually labeled more than five 
of the stimuli in the first session. Second, 
the actual training in use of labels as a 
strategy was very brief. This brief training 
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was probably not sufficient to make the 
nonlabelers as proficient in the use of labels 
as a mnemonic device as the children who 
were already using the strategy sponta- 
neously. Thus, the continuing differences 
in recall between groups in the second sesson 
may have been due to the fact that the in- 
structions not only provided extra help to 
the labelers but also were not sufficient to 
eliminate the differences in mnemonic skills 
between the groups. It should not be over- 
looked, however, that even the brief training 
provided in this study was sufficient to raise 
the performance of nonlabelers to a level 
equivalent to that of children who labeled 
spontaneously in the first session. 

In addition to providing some clarification 
of the developmental processes that underlie 
improvement on the Visual-Sequential 
Memory test with age, the data also suggest 
a possible reason for the very low stability 
coefficients for this test reported by Par- 
askevopoulos and Kirk (1969). "These au- 
thors report a 5-month test-retest correla- 
tion of only .28 for 8-year-old children. The 
data for the fourth-grade control children 
presented in Figure 2 suggests that at this 
age level, children may make large gains in 
performance as a result of adopting a label- 
ing strategy on the task. In fact, in the 
present case, the average recall score of the 
children who switched strategies on the task 
was actually lower on the initial testing than 
the mean score of those who did not improve 
their performance between testings. "Thus, 
there was actually a negative relationship 
between first- and second-session perfor- 
mance within the fourth-grade control 
group. 

This study, in demonstrating the in- 
creasing use of verbal labels to mediate 
performance on a memory task in which the 
stimuli are presented visually, is consistent 
with the general body of developmental re- 
search on similar tasks. The reasons for the 
developmental changes in the use of such 

strategies are only imperfectly understood 
at the present time. Some explanations 
emphasize the fact that older children tend 
to be generally more active, planful, and 
Strategic in their solutions to cognitive 
problems (Flavell, 1971); while others center 
on the child’s increasing knowledge about 
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memory and memory processes, that is, the 
child's *metamemory" (Flavell and Well. 
man, 1977). Recently, the role of school 
experience itself in promoting the use of ef- 
ficient mnemonic strategies has been em- 
phasized in cross-cultural research (Cole & 
Scribner, 1977). This research suggests that 
the unique demands that schooling places on 
the memory system leads to the develop- 
ment of a variety of strategies to enhance 
memory performance. Thus, the processing 
activities or cognitive behaviors by which 
children accomplish various tasks may 
change as a result of their adaptation to the 
demands of school. For this reason, many 
tasks may measure quite different processes 
in children before they are exposed to 
schooling and after they have been in school 
for a few years. These considerations have 
important implications for the use of stan- 
dard diagnostic instruments and suggest 
that one must be very careful in attributing 
performance deficits to deficiencies in spe- 
cific psychological processes. 
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Role of Symmetry in the Mirror-Image Confusions 
of Preschoolers 


Juliet M. Vogel and Kathleen A. Loughlin 
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i n 3 and 5 years old were given a location copying task both 
quebec fren display side by side (horizontally aligned) and 
with displays aligned along their diagonals. Displays were pegboards of three 
levels of complexity: 2 X 2 holes, 4 X 4 holes, and 6 X 6 holes. Left-right re- 
versals were the predominant errors and were frequent for horizontally 
aligned displays; left-right reversals were less frequent and performance more 
accurate for diagonally aligned displays. Only for interior positions on the 6 
X 6 hole array were errors other than left-right reversals frequent; and for 
these positions only, alignment did not influence accuracy. "These results fail 
to support Bryant's hypothesis that mirror-image confusions are no more fre- 
quent than other in-line (in-row) errors, and that these errors result from de- 
pendence on an in-line comparison strategy. 


Young children often confuse mirror- 
image forms such as b and d (e.g., Davidson, 
1934) and mirror-image locations within 
displays (e.g., Laurendeau & Pinard, 1970). 
Simultaneously presented mirror-image 
forms are most frequently confused when 
they form a symmetrical pattern: left-right 
mirror images when they are aligned hori- 
zontally and up-down mirror images when 
they are aligned vertically (Huttenlocher, 
1967a, 1967b; Sekuler & Rosenblith, 1964). 
Such confusions probably reflect a general 
spatial comparison strategy that is also 
elicited by non-mirror-image displays pre- 
sented in the appropriate configuration. 
‘Thus, Barroso and Braine (1974) have shown 
that 3-year-olds who are instructed to orient 
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pairs of nonidentical figures so that they ar 
“turned the same way” show the same pal 
tern of alignment effects, making left-rigl 
reversals for horizontally aligned sideway 
figures and up-down reversals for vertical 
aligned upright and upside-down figures. 
Barroso and Braine (1974) suggest thi 
the alignment effects are a consequence! 
children’s putting corresponding parts í 
displays (e.g., tops or bottoms of figure 
close together, a strategy that e 
ability to use spatial coordinate syste! 
However, there are results for which thi 
explanation cannot account. Two studi 
(Chapman, 1970; Enterline, 1970) ha 
compared the more typical horizontal al 
vertical alignments with a condition in wl 
stimuli are aligned along their diagonal, 
shown in Figure 1. Although a strategy 
putting corresponding parts together co 
not be successful in the diagonal alignmé 
condition, both studies found that diagon 
aligned mirror images were discrimina 
more accurately than ones aligned $ 


egies than do other arrays, or a strategy 
ferent from those previously proposed Ls. 
to the results under all of the conditions 
scribed. i 
Information about strategies that mf 
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Figure 1. 


bute to children's mirror-image con- 
is and lead to alignment effects can be 
ed from children's pattern of errors on 
copying tasks. Bryant (1973, 1974) 
gued that mirror-image confusions on 
m comparison tasks are a result of 
m’s tendency to use an “in-line” spa- 
mparison strategy. He proposes that 
child compares locations in adjacent 
the child scans across the arrays and 
the locations of objects to be equiva- 
they are in the same scan or “row” 
rdless of position in the row. Bryant 
ore argues that mirror-image confu- 
hould be no more frequent than other 
Iheerrors. Support for Bryant's position 
quivocal: Bryant (1973) provides data to 
port his position using a location com- 
ison task with 10 locations in a row, but 
an (1970) used a similar task with 
ee positions in a row and found more 
age errors than other in-line errors. 
fference between the complexity of 
arrays in the two studies suggested to us 
t Bryant's hypothesis might be true in 
y a very limited way— possibly children 
te indiscriminate in-line responses only 
e extent that it is difficult for the chil- 
n to discriminate among the positions 
6 compared. 

Is consistent with Piaget’s work that the 
lexity of the arrays used in a location 
arison task should influence the like- 


Three types of stimulus alignment used in mirror-image discrimination tasks. 


lihood of mirror-image confusions. Piaget 
and Inhelder (1948/1956) propose that 
children use topological spatial cues long 
before they can use a euclidean coordinate 
system, and considerable evidence supports 
this position (e.g., Laurendeau & Pinard, 
1970). According to Piaget and Inhelder, 
one of the topological cues used earliest is the 
cue of proximity. When a display contains 
rows of clearly discriminable locations, the 
end-of-row positions may be discriminated 
from interior positions by their proximity to 
an edge. Children who distinguish only 
between end and interior positions will make 
either correct or *mirror-image" responses 
if a row contains no more than two end po- 
sitions and two interior positions; however, 
when a row contains more than four posi- 
tions, such children will confuse non-mir- 
ror-image interior locations. 

Piaget and Inhelder's work suggests a 
different pattern of results for older pre- 
school children. These children are fre- 
quently able to copy a sequence of items, but 
they are likely to judge order in the sequence 
by the order in which elements are encoun- 
tered without taking into account any 
changes in direction of scanning, and there- 
fore the children are susceptible to confusing 
a sequence with its reverse. This suggests 
that when copying a position in a grid such 
as the one shown on the left side of Figure 2, 
the older preschool child will sometimes start 


EJE 


Figure? Left panel shows 4 X 4 position grids in horizontal alignment, and right panel shows 6 X 6 

diagonal (For each of these pairs of grids, filled circles indicate locations 

her peg in the present study, with one location used per trial; the pair 
with a left-right reversal response.) 


position grids in 
in which the experimenter 
of Xs indicates a sample 
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cate the finding that left-right mirror-i 


previous studies comparing diagonal 

other alignments, we used a location copy 
task in order to infer from the pattern of 
rors the location comparison strategy 
Our displays were pegboards that provi 
clear grids of positions, and we varied 
complexity so that we could discrimi 
among the strategies described above. 


Method 
Subjects 


‘Thirty-two 3'4-year-olds (3 years 2 months to4 
4 months) and 32 5-year-olds (4 years 8 months 
years 5 months) participated in this study. The 
dren were in a nursery school serving mi 
class families in central New Jersey. Half of the 
be from each age group were males and half were 


Apparatus 


"Three pairs of Tinker Toy plastic pegboards 
wed: 2X 2 hole, 4 X 4 hole, and 6 X 6 hole 

0f 5.7 X 5/7 cm, 11.4 X 11.4 cm, and 174 

17.1 em, respectively. Two identical Tinker Toy fi 

rine pegs served as location markers. 


Procedure 


Each child was tested individually, with the 
seated to the right of the experimenter at à table. 


boards were on the table in front of the 
The experimenter gave the following instruc- 


heard (inserts peg into a hole on the left board). Now 
Jwant you to put your little man in the same place on 
your board. Remember, put it in the same place as 
mine. 


‘Dp each trial, the experimenter placed her peg ina 
on the left board, and the child placed the other peg 
hole on the right board, Periodically, the child was 
, "Remember, put it in the same place as 


Bight children of each age were tested on boards of 
three levels of complexity, ‘These children were 
on the 2 X 2 hole boards during the first session, 
4X 4 hole boards during a second session, and the 
B hole boards during their final session. A period 
‘Zto 4 days intervened between sessions. The re- 
ing subjects were divided into three groups, each 
ing of eight children from each age range. Each 
these groups was tested with boards of one of the 
xity levels, with all testing of each child being 
in a single session. ‘This design was selected 
order to determine whether experience with the 
t boards influenced the strategies used for the 
complex ones. 
ng sessions in which the 4 X 4 and 6 X 6 hole 
were used, there were 12 trials with the boards 
horizontally with a 1-inch separation between 
and 12 trials with the boards aligned diagonally 
à l-inch separation between the lower right corner 
the left board and the upper left corner of the right 
|. During the session with the 2 X 2 hole boards, 
were 6 trials with these boards in each alignment, 
red one preceded by 12 trials with a pair of 1 X 2 


For each level of complexity, the same positions of the 
Were used for the horizontal and diagonal align- 
The 12 positions used on the 4 X 4 and 6 X 6 
boards are shown in Figure 2. The order of posi- 
was randomized separately for the two alignments. 
of the three-session subjects and half of the sin- 
subjects in each age group received trials 
boards aligned horizontally before trials with the 
alignment, and the remaining subjects received 
feverse order. For subjects tested on all three 
the order of board alignment was maintained 
the three sessions. 
Fach session took approximately 10 minutes. Two 
from the younger group refused to participate 
were replaced. All children who began the task 
it. 


Results 
ll Accuracy: All Boards 


The results of central interest are the ef- 
9f board alignment on both the accu- 
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racy of performance and the pattern of re- 

for each complexity level? For each 
complexity level, the effect of alignment on 
overall accuracy was examined by means of 
a2 X2 X2 X 2 analysis of variance with 
alignment as a within-subjects variable and 
age, order of alignment conditions, and ex- 
perience (three-session vs. single-session 
condition) as between-subjects variables. 
The most striking result is that for the 2 X 2 
and 4 X 4 hole boards, children made far 
more errors when the boards were aligned 
horizontally than when they were aligned 
diagonally: 36% versus 18% of responses, 
F(1, 24) = 7.77, p € .02, for the 2 X 2-hole 
boards; and 48% versus 27%, F(1, 24) = 21.51, 
p <.001, for the 4 X 4 hole boards, Only for 
the most complex 6 X 6 hole boards was 
there no effect of alignment, Table 1 sum- 
marizes the effects of board alignment and 
age on performance for the three complexity 
levels. 

Older children tended to perform more 
accurately than younger children on boards 
of all levels of complexity (see Table 1); 
however, the age difference reached signifi- 
cance only for the 4 X 4 hole boards, F(1, 24) 
= 4.49, p <.05. [It approached significance 
Li bo DIA = 3,98, p 
= ,058. 

The difference between single-session and 
three-session subjects did not approach 
significance for any level of complexity, in- 


dicating that children with prior experience 
using ls comple nir hend 
inex children eir 

the t^ 4 hole and 6 X 6 hole boards. There 
were also no effects of order of presentation 
of the two board alignments, 


1 Results of trials with the 1 X 2 hole boards are not 

crucial issues in this article and will not be dis- 

pew General results can be summarized as 

follows: Few errors were made when the boards were 
and 
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Horizontal alignment 
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Percentage of Responses in Each Category for Boards in Horizontal and Diagonal Alignment 


Diagonal alignment 


Errors Errors 
In row In row In column 
Left- 
= "wm Non- right Non- Near- Non- Diago- 
isik rever- rever- rever- rever- far rever- nal re- 
epe Correct sal sals Other Correct — sal sals reversal sals versal Other 
2 X 2 holes Y. 3 T 
76 14 — 4 6 
3'4 years 62 36 — 1 m 3 
5 SN 68 32 — 0 88 11 — 0 1 
4 X 4 holes : 
3'4 years 42 44 2 13 66 20 3 4 ; : | 
5 years 64 31 2 4 81 B 1 5 
6 X 6 holes 
314 years 
5 years 


Note. 


Pattern of Errors: 2 X 2and 4 X 4 Hole 
Boards 


For the horizontally aligned boards, we 
examined the frequency of left-right rever- 
sals to determine whether children tended 
to make reversals around the vertical 
boundary between the model and the com- 
parison board, and we compared the fre- 
quency of left-right reversals with the fre- 
quency of in-row (in-line) errors that were 
not left-right reversals and of out-of-row 
(out-of-line) errors. For the diagonally 
aligned boards, we examined three iypes of 
reversals: mirror reversals around an im- 
agined diagonal boundary between the two 
boards (diagonal reversals), reversals around 
an imagined extension of the vertical edge of 
the experimenter’s board (left-right rever- 
sals), and reversals around an imagined ex- 
tension of the horizontal edge of the experi- 
menter's board (near-far reversals). The 
mee of these analyses are shown in Table 

As Table 1 indicates, for hori ntal 
aligned 2 X 2 hole boards, 99.59% cp s 
sponses were in the correct row; and it fol- 
lows automatically that almost all errors 
(98%) were left-right reversals of position 
within the correct row. For horizontally 
aligned 4 X 4 hole boards, 92% of the Te- 


Dashes indicate that the category of response could not occur for the 2 X 2 hole boards. 


sponses were in the correct row; further 
more than one half of the in-row respoi si 
by younger children and more than one thi 
of the in-row responses by older child i 
were errors, and 95% of these errors wêl 
left-right reversals. 

When diagonally aligned, these boar 
again predominantly produced in-row m 
sponses: 96% and 89% of responses to 2% 
and 4 X 4 hole boards, respectively. A 
though left-right reversals within the coi ré 
row were the most frequent type of err 
they were considerably less frequent than | 
horizontally aligned boards, and this di 
crease was almost entirely accounted for! 
a increase in correct responses (see Ta 
1). 

There are three important characteristi 
of this pattern of results. First, there Wi 
little tendency to make in-row errors 0! hi 
than left-right reversals. Second, ou a 
Tow errors were rare even in the diagon 
alignment condition, when children co 
not locate the correct row simply by f 
lowing a sequence of holes. Third, the di 
ference in accuracy between the horizont 
and diagonal alignment conditions is show 
to be due to the greater frequency of lef 
right reversals in the former condition. 4 

This pattern of results is consistent wi 
a tendency to make mirror-image response 


‘able 2 
i Hole Board 


Position Left- 
in row right 

and age Correct reversal 
End 

34 years 59 39 

5 years 64 34 
Interior 

34 years 48 20 

5 years 61 22 


when they result in a pattern that is sym- 
netrical around a prominent boundary be- 
een displays, but this tendency cannot 
unt completely for the observed results. 
for both alignments, children showed a bias 
toward placing their responses on the left 
side of their board, the side that was closer 
tothe experimenter’s board. Thus, for the 
1X 4 hole boards, responses to left-side 
placements were located on the correct side 
m 80% and 93% of the trials with horizon- 
lally and diagonally aligned boards, respec- 
lively; but responses to right-side place- 

ents were located on the correct side on 
only 42% and 68% of the trials with hori- 
!ntally and diagonally aligned boards, re- 
spectively. These asymmetries were shown 
Wy both age groups. 


Pattern of Errors: 6 X 6 Hole Boards 


Analyses of types of responses comparable 
to those for the less complex boards are 
own in Table 1. As the table shows, chil- 
Aen made more in-row errors that were not 
{t-tight reversals and more out-of-row er- 
e on the 6 X 6 hole boards than on the less 
mplex boards, and out-of-row errors were 
Particularly likely on the diagonally aligned 
^ hole boards. The increase in out-of- 
responses did not reflect an increase in 
Er or diagonal reversals. Further 
i tes of both near-far and left-right po- 
ditt of the responses support the view that 
entice in making distance distinctions 
tributed to the pattern of results for the 
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Percentage of Responses in Each Category of Left-Right Position for All Responses on the 6 X 


Diagonal alignment. 
Non- Left- Non- 
reversal right reversal 

errors Correct reversal errors 

2 80 11 9 

2 91 6 3 

33 47 16 37 

17 66 14 20 


6 X 6 hole boards. These analyses are de- 
scribed below. 

Vertical (near-far) position of respon- 
ses. For the horizontally aligned boards, 
the frequency of out-of-row errors (1496 of 
responses) was greater than for the less 
complex boards (see Table 1), but children 
were still significantly more likely to make 
in-row errors (3796 of responses) according 
to a sign test, p(23 vs. 2) < .001. The nature 
of the out-of-row errors supports the view 
that children were attempting to follow the 
row containing the peg, since only 3 of 51 
vertical errors were displaced by more than 
onerow. Furthermore, children were aided 
by the topological cue of adjacency to the top 
or bottom edge, since 9896 of the responses 
to placements in the edge rows but only 8196 
of the responses to placements in interior 
rows were located in the correct row. 

For the diagonally aligned 6 X 6 hole ar- 
rays only, in-row errors (2096 of responses) 
were not more frequent than out-of-row er- 
rors (28% of responses). Adjacency to the 
top or bottom edge strongly influenced the 
accuracy with which children located the 
peg's row. When the experimenter's peg 
was in the top or bottom row, 94% of the re- 
sponses were located in the correct row; 
however, when the experimenter's peg was 
in one of the four interior rows, the correct 
row was used in only 6296 of the placements. 
Compared to their performance on hori- 
zontally aligned arrays, children were sig- 
nificantly less accurate at locating interior 
rows on the diagonally aligned arrays, Fd, 
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24) = 20.27, p < .001, according to a 2 X 2X 
2 X 2 (Age X Alignment X Order of Align- 
ments X Experience) analysis of variance. 
Lateral (left-right) positions of respon- 
ses. The lateral position of responses was 
considered separately for placements on the 
ends and in the interiors of rows. Table 2 
shows the pattern of lateral positions for 
in-row and out-of-row responses combined; 
the pattern for in-row responses alone is al- 
most identical. For end-of-row positions, 
the pattern is comparable to that for less 
complex boards: There were frequent 
left-right reversals (3696 of responses) in the 
horizontal alignment condition; in the di- 
agonal alignment condition, there were fewer 
left-right reversals (896 of responses) and 
more use of the correct left-right position 
(86% compared to 61%). In both align- 
ments, errors in lateral position other than 
left-right reversals wererare. A2X2X2X 
2 (Age X Alignment X Order X Experience) 
analysis of variance on the accuracy of the 
lateral position of these responses yielded a 
highly significant effect of alignment, F(1, 
24) = 23,28, p < .001, but no other significant 
effects. 

As Table 2 shows, responses to the interior 
positions of the 6 X 6 hole boards accounted 
for most of the lateral errors that were not 
left-right reversals. These were the only 
locations for which the response pattern of 
the two age groups appeared to differ—for 
the horizontally aligned arrays, the younger 
children made fewer left-right reversals than 
other errors of lateral position (20% vs, 33% 
of responses), but the older children con- 
tinued to make more reversals than other 
lateral errors (22% vs, 17%). To determine 
if the younger children’s pattern resulted 
from difficulty in judging the distance of the 
interior holes from the sides of the board, we 

analyzed the number of responses that were 
the correct distance from a side regardless of 
the lateral edge used as a referent, that is, the 
sum of responses in the correct column and 
left-right reversals. According toa 2X 2x 
2 X 2 (Age X Alignment X Experience x 
Order) analysis of variance, older children 
were significantly more accurate than 
younger children at judging the distance of 
a peg from the side of the board, F(1, 24) = 
6.25, p < .025. However, there was also a 
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significant interaction between age and 
perience, F(1, 24) = 4.55, p <.05. Acco 
to post hoc comparisons using Duncan's ng 
multiple-range test, there was an effect g 
age only for inexperienced children 
Younger inexperienced children locat 
their pegs the correct distance from a latera 
edge on only 55% of the trials (barely mon 
than the 50% expected if the children we 
selecting randomly among the four interi 
columns) in contrast to older inexperience 
children, who maintained the correct dis 
tance on 8496 of their trials. None of th 
other differences reached significance; an 
for the experienced children, the means f 
the two age groups were almost identici 
Younger children maintained the corre 
distance on 7696 of the trials, and older chi 
dren did so on 7896 of the trials. 
Although responses to interior positio 
on the 6 X 6 hole board included significa 
fewer left-right reversals in the diagon 
alignment conditiofi than in the horizo 
alignment condition, F(1, 24) = 4.98, p <.0 
the two alignments did not differ with m 
spect to the frequency of correct lateral pt 
sitions. i 
For both edge and interior positions on th 
6 X 6 hole boards, children tended to pu 
their responses on the left side of their board 
a bias comparable to that observed for. 
less complex boards. Thus, responses 
left-side placements were located om. t 
correct side for 82% of the placements on th 
horizontally aligned boards and 90% of th 
placements on the diagonally aligned board 
but responses to the right-side placemenl 
were located on the correct side for only 4 
and 6796 of the placements on horizon! ^ 
and diagonally aligned boards, respectiv 
ly. 


Discussion 


In the current study, left-right revers 
were the predominant error when chi dr 
attempted to copy the location of a peg usi 
a comparison board that was aligned hon 
zontally with a standard display. Althoug 
children’s responses on 2 X 2 and 4 X 4h 
boards can be viewed as resulting from I0 
lowing the correct row and within this I 
maintaining a distinction between end a! 


interior position, for the more complex ® 


fj hole boards, this description is adequate 
only for responses of inexperienced 31- 
year-olds. In contrast, 3!5-year-olds who 
had experience with simpler arrays and 5- 
year-olds in both groups further differen- 
tiated interior positions on the 6 X 6 hole 
arrays, maintaining the correct distance of 
the peg from a lateral edge on a substantial 
proportion of trials even though they often 
appeared confused concerning the correct 
edge to use as a referent. These results 
suggest that children have a tendency to 
make systematic left-right reversals, but. 
that this tendency is limited by the chil- 
dren's ability to discriminate among posi- 
tions. It seems probable that experience 
specifically with the arrays (for 3!5-year- 
olds) or more general experience (for 5- 
year-olds) led to differentiation of the inte- 
rior positions of the 6 X 6 hole arrays. 

For all complexity levels, the use of di- 
igonally aligned arrays resulted in fewer 
left-right reversals, and this was accompa- 
nied by increased accuracy for all locations 
except those in the interiors of rows of the 6 
X6hole boards. "These results support the 
tonclusion that mirror-image confusions are 
more likely when the mirror images form a 
symmetrical pattern around a prominent 
boundary between them than under other 
ürumstances. In addition, children located 
the peg’s row as accurately with diagonally 
digned displays as with horizontally aligned 
Mes, except when the peg was in an interior 
‘ow of the 6 X 6 hole board. These results 
indicate that our subjects did not depend on 

Ing able to trace a sequence of holes for 
Wating the near-far position of a peg in 
‘imple matrices. However, the children did 
make a greater number of errors in locating 
Interior rows of the 6 X 6 hole boards when 

e boards were in a diagonal alignment, 
Suggesting that the number of near-far po- 
‘itions that could be distinguished was lim- 
h And raising the possibility that near-far 
qt einctions were made largely on the basis 

Combinations of topological cues such as 
aet to-self/far-from-self and edge/interior 

In addition to the difference between 
qunments, for both alignments, children 
Ee some bias toward placing their re- 
mses on the left side of their board, that 
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is, the side nearer to the experimenter's 
board. This bias suggests that children 
sometimes followed a row from the edge 
closer to the experimenter's array and se- 
lected the first position the correct distance 
from the edge or with the correct topological 
characteristics. However, this interpreta- 
tion must remain tentative, since the relative 
position of the child's and the experimenter's 
arrays was not counterbalanced. 

The present results are counter to 
Bryant's (1973, 1974) hypothesis that pre- 
school children are no more likely to make 
mirror-image confusions than other in-line 
responses, and they fail to support his posi- 
tion that children depend on the use of an 
in-line strategy. They do replicate, with a 
location copying task, a pattern of alignment 
effects found previously with figure-orien- 
tation discrimination tasks: a greater fre- 
quency of left-right reversal errors for the 
horizontal alignment condition and greater 
accuracy for a diagonal alignment condition. 
It remains to be explained why children's 
left-right mirror-image errors are more fre- 
quent under horizontal alignment conditions 
than under other conditions. The factor 
that seems most relevant is the presence of 
a salient vertical boundary between the ho- 
rizontally aligned displays that can serve as 
a spatial referent or "landmark." 

Young children probably reduce most di- 
rection discriminations to topological dis- 
tinctions of proximity to landmarks, for ex- 
ample, “toward-the-window” versus 
“toward-the-door.” Since young children 
frequently have difficulty distinguishing 
between their left and right sides, they seem 
particularly likely to rely on external land- 
marks for left-right distinctions. When 
children compare horizontally aligned dis- 
plays, they are likely to use the boundary 
between the displays as a referent and orient 
figures or judge distances so that they are the 
same with respect to this boundary. Under 
other alignment conditions, that is, in the 
absence of a salient common boundary be- 
tween displays, children are likely to use a 


3 This interpretation was suggested by a 4-year-old’s 
explanation that symmetrically aligned horseshoes were 
going the same way because they were “like magnets 
sticking to the same place.” 
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referent that is on the same side of both 
displays. 

Use of the primitive topological strategy 
just described cannot account for one ob- 
servation concerning alignment effects, that 
stimulus alignment influences adults’ re- 
sponse times on mirror-image discrimination 
tasks in a pattern that parallels the effects of 
alignment on children’s accuracy (Wolff, 
1971). This result can be explained by the 
proposal that when individuals develop an 
internal frame of reference that does not 
depend upon the specific external landmarks 

in the visual field, their earlier use of land- 
marks also leads to the development of rel- 
ative orientation and relative position no- 
tions, that is, that figures are oriented or 
positioned the same way when they are the 
same with respect to a boundary between 
them. (For mirror images, this is equivalent 
to saying that “symmetrical” is one meaning 
of "same.") Indeed, there is evidence that 
adults maintain both internal or absolute 
and relative interpretations of “same or- 
ientation" but select the appropriate one as 
a task demands: Adults learn as easily to 
respond to symmetrically aligned mirror- 
image patterns as “same” and to patterns 
facing the same way as "different" as to re- 
spond to the opposite manner (Staller & 
Sekuler, 1976). Thus, in alignment studies 
with adults, longer latencies for symmetri- 
cally aligned stimuli could result from a need 
to suppress the irrelevant relative orienta- 
tion criterion when the two criteria for “same 
orientation” are in conflict. In the current 
study, itis possible that some children were 
beginning to use an internal frame of refer- 
ence in the diagonal alignment condition, 
but that the prominence of the boundary 
between the displays sometimes led these 
children to choose a relative position crite- 
E in the horizontal alignment condition. 
n summary, two factors ma i 
to the alignment. effects in the aoa. Tin 
First, some children may judge left-right 
position in terms of proximity to landmarks 
using the boundary between arrays as the 
landmark in the horizontal alignment con- 
dition but alandmark on the same side of the 
two arrays in the diagonal alignment condi- 
tion. Second, some children may use an 
internal frame of reference in the diagonal 
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alignment condition; however, the promi: 
nence of the boundary between the arrays 
may sometimes lead these children to use; 
relative position criterion when comparing 
horizontally aligned arrays. 

In the current study, we assessed chil 
dren's interpretation of “same” with respec 
to spatial location on two displays just a 
Barroso and Braine (1974) obtained infor 
mation about children’s interpretation 9 
“turned the same way” when comparing 
orientations of two figures. Neither stud 
directly assessed children’s ability to lear 
to make correct left-right judgments when 
given feedback concerning the desired cti 
teria. However, the symmetrical alignmen 
condition that seems to evoke inappropriati 
or competing interpretations of same 0 
ientation and location is also the one f 
which learning of a mirror-image discrimi 
nation is slower (e.g., Huttenlocher, 1967b 
Sekuler & Rosenblith, 1964). Indeed, fol 
discrimination learning tasks in which th 
child must remember the orientation of tht 
correct alternative from trial to trial, learnin 
is more rapid when two mirror-image alter 
natives are presented successively than whei 
they are presented simultaneously in syn 
metrical alignment (Jeffrey, 1966; Thomp 
son, 1975; Williams & Ackerman, 1971).4 

The accumulating evidence concerning 
the special nature of the symmetrical align: 
ment condition has clear practical implica 
tions. Children’s workbooks and tests fit 
quently include problems with a standall 
figure on the left and comparison figure 
lined up to the right. The research 0l 
alignment effects suggests that these pról 
lems employ the most confusing spatial at 
rangement for teaching children to al 
criminate left-right mirror images, and thi 
performance on these problems may not! 
a very good indicator of ability to discri mi 
nate mirror images under other circum 
Stances. 


‘Tt should be noted that in these studies, both # 
multaneous and successive conditions required mem 
of the correct orientation from trial to trial. In cones 
on matching-to-sample tasks in which successive kc 
sions require memory for orientation but simultane 
versions do not, it is generally found that success 
problems are more difficult than simultaneous 
(e.g., Bryant, 1973; Wechsler & Hagin, 1964). 
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Is Speed of Processing Information 
Related to Intelligence and Achievement? 


Mona R. Spiegel and N. Dale Bryant 
Teachers College, Columbia University 


Speed of processing information, measured by mean response time and slope 
of response times for tasks of varying levels of complexity, was correlated with 
intelligence (measured by the Lorge-Thorndike, a group test emphasizing 
power rather than speed) and with achievement (measured by Stanford 
Achievement Test scores in reading and mathematics). Three tasks were in- 
dividually administered to 94 sixth-graders, 51 males and 43 females. Reli- 
ability of mean response time (ru = .9) was greater than that of slope, and cor- 
related significantly with IQ (r = —.6) and with achievement (r = —.4——.5). 
Correlations with achievement dropped to near zero with IQ partialled out. 
Speed of processing information was found to generalize across experimental 
tasks and to reliably indicate intellectual ability. 


Quickness in performing an intellectual 
task is often considered a characteristic of 
bright individuals. Whether in learning new 
material or in recalling previously learned 
material, individuals who respond more 
quickly are often viewed as more intelli- 
gent. 

In recent years, especially, there has been 
increased emphasis on measuring academic 
potential in terms of learning rate. Indi- 
vidualized instruction, programmed learn- 
ing, and groupings of fast and slow learners 
are only a few examples of educational 
techniques that take learning rate into ac- 
count, They demonstrate that increasingly 
greater recognition is being given to the idea 
that speed of processing information may be 
a characteristic of learning ability that is 
related to general intelligence. 

. Many psychometric instruments measure 
intellectual ability in terms of both the speed 
and accuracy with which one completes a 
given. task. In most group-administered 
intelligence tests, for example, students are 
instructed to answer correctly as many 
questions of increasing difficulty as possible 


: This study is based on a dissertation by Mona R. 
Spiegel under the direction of N. Dale Bryant in partial 
fulfillment of the requirements for the doctoral de- 
gree. 

Requests for reprints should be sent to Mo 

qu i n 7 
Spiegel, 131 Bennett Avenue, New York, New York 
10033. 3 


Copyright 1978 by the American Psychological Association, Inc. 0022-0663/78/7006 -0904$00.75 


904 


within a given amount of time. The studen 
who responds slowly will answer fe 
questions and receive a lower score tha 
peer who is able to answer more questions ii 
the same amount of time. The resulting: 
score, therefore, depends on the speed will 
which the information is processed as well a 
the accuracy of the response. In fact, when 
time scores have been added to the numbe 
right scores on Raven’s Progressive Matrices 
correlations with other intelligence 
have been found to increase (Tonn, 1 
Wolf & Stroud, 1961). Hunt, Lunneborg, ant 
Lewis (1975) also point out that "althoug 
a verbal intelligence test is directly a measull 
of what people know, it is indirectly a way ol 
identifying people who can code and mi 
nipulate verbal stimuli rapidly in situation 
in which knowledge per se is not a may 
factor” (p. 223). | 
Traditional tests of intelligence have be 
come the focus of public censure primari 
because of their reliance on knowledge thé 
is not equally available to all individuals. 
the search for substitute criteria, there ha 
been increased interest in the possibility 
a measure of speed of processing informatie? 
may be used to predict intellectual abilit) 
Some researchers have attempted to mel 
sure information-processing activity ? 
means of EEG latencies of evoked responsi 
to repeated external stimuli (see Crawford 
1974). And indeed, significant correlatio! 
(r = .3) have been found between conve? 
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tional intelligence tests and latency of av- 
eraged evoked responses to visual stimuli 
(see review by Callaway, 1973). Other re- 
searchers have attempted to measure speed 
of processing information by measuring re- 
sponse time to tasks that utilize highly 
overlearned material. These studies, too, 
have found that the average response times 
are significantly different for high- and 
'Jow-verbal ability groups of adults on a va- 
tiety of tasks: solving verbal and mental 
arithmetic problems, matching upper- and 
lowercase letters, identifying homophones, 
and identifying words of the same taxonomic 
category (Goldberg, Schwartz, & Stewart, 
1977; Hunt, Frost, & Lunneborg, 1973; Hunt 
etal., 1975). The results suggest that indi- 
viduals with greater intellectual ability are 
more rapid at processing conceptual infor- 
mation than are peers with lesser intellectual 
ability. In sum, speed of processing infor- 
mation, measured either physiologically or 
behaviorally, may be a factor in intelli- 
gence. 


In addition to any relationship found be- 
tween average response time and intelli- 
gence, the amount of increase in response 
lime as the items in a task become more 
complex may be related to intelligence. 
Scott (1940), for example, found that his low 
IQ group increased their response times from 

the easier to the more complex light-button 
combinations much more than did the high 
IQ groups. Eysenck (1967) also predicts 
that the slope of the regression line, showing 
increase of response time as a function of 
amount of information processed, will cor- 
telate negatively with intelligence. 

The purpose of the present study was to 
explore more systematically the relationship 
3 Speed of processing information to intel- 
'gence and achievement. In particular, this 

Study sought to answer the following ques- 
ps For correct responses to tasks of 
ene levels of complexity (each of which 
mE be performed correctly by all subjects 
ec untimed conditions), to what extent 
; € (a) mean response time and (b) rate of 
B in response time with increasing 

i m complexity (slope) related to (a) intel- 

Bence (as measured by a group intelligence 
Kun generous time limits and yielding 

_ "bal, nonverbal, and composite IQ scores) 
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and (b) achievement (as measured by stan- 
dardized timed achievement tests in reading 
and mathematics)? 


Method 


Subjects 


There were 94 subjects included in the study, 51 
males and 43 females. All subjects were students in the 
sixth grade, with an age range from 10 years 7 months 
to 12 years 6 months (M = 11 years 2 months), and all 
spoke English as a first language. All demonstrated 
knowledge of the concepts used in any of the experi- 
mental tasks (e.g., “above” and below"), In order to 
obtain a heterogeneous sample with respect to IQ, four 
parochial schools sampling different populations from 
a large urban setting were used. The mean Composite 
IQ score (Lorge-Thorndike Intelligence Test, Multi- 
Level Edition) for the sample as a whole was 116.7, with 
a standard deviation of 14.16 and a range from 80 to 148, 
"The mean IQ score for the sample was thus above that 
of the general population, but the distribution of scores 
on observation reflected a normal distribution. 


Tasks 


Three kinds of tasks were used in order to determine 
whether the relationship between response time and the 
criterion measures was task specific. Each task was 
divided into four levels of increasing complexity, with 
eight items in each level. ‘The items were administered 
under timed conditions to approximately 20 subjects, 
all of whom spoke English as a first language. Within 
each complexity level, items whose response times de- 
viated more than +.5 sec from the mean response time 
of that level were discarded. In addition, the tasks were 
pretested to ensure that all items could be performed 
by all subjects at the sixth-grade level, by means of an 
untimed administration of the tasks to a separate 
sample of 20 sixth-graders. ] 

The first task was a sentence-picture comparison 
task based on a model proposed by Clark and Chase 
(1972): The subject reads a simple statement about the 
spatial position of symbols displayed vertically (e.g., * 
ABOVE +), then looks at the picture of the symbols, +, 
and determines whether the statement is an accurate 
description of the picture or not. (In the above exam- 
ple, the correct response is true.) The levels increase 
in complexity by the addition of a negative to the sen- 
tence (e.g., * NOT BELOW +) and/or an increase in the 
number of referent symbols from two to three (*, +, and 
0). This task was chosen because it involves minimal 
reading (the words above, below, and, at the higher 
levels, not) and only that degree of conceptual knowl- 
edge pertinent to the terms used. i : 

The second task was pictorial similarities and dif- 
ferences. Here the subject is shown four pictures (A, 
B, C, and D), three of which are similar or related in 
some way, and is asked to indicate which picture is 
different. At the simplest level the three pictures are 
almost identical and are readily distinguishable from 
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the fourth one (e.g., three jackets versus a dress). At 
a more complex level the common trait that links the 
three pictures and distinguishes them from the fourth 
one is more difficult to discern (e.g., three four-legged 
animals versus a bird). The pictures used here are 
similar to items found in many measures of conceptual 
ability. 

The last task, matrix analysis, was based on progres- 
sive matrices similar to those in Raven's Progressive 
Matrices. It is comprised of spatially related figures 
which require educing the relation among the figures 
and then choosing the appropriate figure to complete 
the pattern. The shape of the cutouts was identical to 
the shape in items from the Progressive Matrices, but 
there were four (A, B, C, and D) instead of six alterna- 
tive answers. The items were analyzed according to 
content and, on the basis of Penrose and Raven's (1936) 
original study, separated into ascending levels of diffi- 
culty. Original items were devised, and the task was 
pretested, as stated above. 


Apparatus 


‘The stimuli for each task were projected onto a por- 
table screen, and subjects responded by means of 
pressing (with the index finger of the preferred hand) 
one of four labeled response buttons that were in front 
of the screen. For the first (sentence-picture com- 
parison) task, two of the buttons were labeled true and 
false, and the remaining two were unused. For the 
other two tasks (pictorial similarities and differences, 
and matrix analysis) all the buttons were used and were, 
respectively, labeled A, B, C, and D. A large X was 
printed equidistant to the buttons, indicating where the 
subject's finger should rest in preparation for the button 

presses, 

An electric stop clock (Lafayette Model 58007) was 
used to measure response time, from initial presentation 
d the o to the button press, to the nearest hundredth 
of a second. 


Procedure 


All subjects were given the Lorge-Thorndike In- 
telligence ‘Test, Multi-Level Edition, Level D—a group 
intelligence test which is administered with generous 
time limits and which emphasizes power rather than 
speed. It correlates .8 with the Stanford-Binet and the 
Wechsler Intelligience Scale for Children (WISC), and 
yields three IQ scores: Verbal, Nonverbal, and Com- 
posite (Thorndike, Note 1). Achievement scores 
(Stanford Achievement Test, 1973) were used for 
reading comprehension, mathematics concepts, and 
mathematics computation. i 

After training and successful performance with 

sample items, each subject was tested individually with 
the experimental tasks. Instructions varied according 
to the specific task, but several instructions were com- 
mon to all conditions: (a) The subjects were asked to 
press the correct response button as quickly as possible. 
(b) The subjects were told not to guess because, if they 
did guess, “it would not count.” (c) All subjects were 
verbally approved for a correct response (e.g., “good” 
or "fine") and were told when an error occurred. 
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Treatment of Data 


Although all tasks could be performed by all subjects 
under untimed conditions, some errors did occur under 
timed conditions, and response times for incorrectly 
answered items were not used in the person's score, 
Median response times were calculated in each level, 
and mean response time was computed by taking the 
average of the medians for each of the four levels of 
complexity. The rate of increase in response time (or 
slope) over the medians of the four levels was computed 
by means of the Theil statistic (Hollander & Wolfe, 
1973, pp. 205-206). Median response times increased 
from 4.5 to 8.3 sec, in approximately equal intervals of 
1 sec, in the sentence-picture comparison task. In the 
other two tasks, however, median response times for the 
two lowest complexity levels were found to be so close 
as to argue against their being considered separate 
levels. Collapsing these two levels and computing a 
slope based on the median response times for three 
levels resulted in such low reliabilities that the slopes 
for these tasks were not used. Only the slope for the 
sentence-picture comparison task was considered in the 
analysis of data. 


Results 


Reliability and Intercorrelations 


Mean response time was found to be a 
very reliable measure, with split-half reli- 
ability coefficients averaging about .9 (see 
boldface diagonal values in Table 1). The 
mean response times for the three separate 
tasks also intercorrelated to a moderately 
high degree (approximately .6), suggesting 
some generality of a factor for speed of pro- 
cessing information. 

Slope of the response times for the sen- 
tence-picture comparison task proved to be 
less reliable than mean response time (ri = 
-1). The correlation between mean response 
time and slope for this task (r = .8) ap- 
proached unity, when corrected for attent- 
ation. Thus slope did not add much to the 
mean as a measure of processing speed. 


Relationship to Intelligence and 
Achievement 


| 


Mean response time was found to corre: | 
late substantially with Composite IQ scores 
on the Lorge-Thorndike Intelligence Test 
(r = —6), and slightly, but consistently; 
higher with Nonverbal IQ than with Verba 
IQ scores (see Table 1). ‘This was true no 
only for the pictorial similarities and di^ 


ferences, and matrix analysis tasks, but fo 


mS S 
—————— DG ——————— 


ne P 


‘proximately —.5 for all three 
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| Sentence-picture M 94  .68 NI 98 4 D I : n 
Similarities & differences M .62 88" 67 556 —63 -.52 -—63 -4T —35 —.35 
Matrix analysis M 69 60 $92» 57 —64 -54 —64 —50 -44 -45 
Sentence-picture slope 18 AS 45 68" —59 —.55 —54 -53 —A2 —.40 
Composite IQ -62 -57 -59 -4T 93 99 1.00 -70 10 60 
6, Verbal IQ -55 —47 -50 -4 92 939 373 56 57 46 
l Nonverbal IQ -60 -57 —.59 -43 939 70 93 55 42 65 
Reading comprehension -53 -43 —47 -43 $6 zn 52 95 72b 67 
| Math concepts -48 -—30 -39. —32 62  .51 64 59 85> 7b 
y - E E 55 42 959 53 TTC E0 


Math computation 


(294, Values above the diagonal are corrected for attenuation. Diagonal values (in boldface) are intercorrelations and 


Split-half (odd-even) correlations, corrected to full length by Spearman-Brown formula. 
Intercorrelations and split-half (odd-even) correlations taken from publishers" manuals. 


the more verbally loaded sentence-picture 
mparison task as well. 

Correlations between mean response time 
md achievement scores were, in general, 
slightly lower than between mean response 
time and IQ (see Table 1). Correlations with 
reading comprehension scores were ap- 
tasks, while 
correlations of mean response time with 
mathematics concepts and mathematics 


“computation were about —.4. 


Slope of the response times for the sen- 
tence-picture comparison task correlated 


| moderately with IQ (r = —.5), and about 


j ab 


i 


equally with Verbal and Nonverbal IQ. 
When corrected for attenuation, the slope for 
Sentence-picture comparison correlated as 
highly with IQ as did mean response time. 
Lower correlations were obtained between 
the slope for this task and achievement (r= 
74-—.3) than between slope and IQ. 

Since the achievement tests were timed 
and the intelligence test was relatively un- 
timed, one might expect that response time 
would be more highly related to the 
achievement than to the IQ scores. The 
lower correlations obtained between re: 
Sponse time (both mean response time and 
slope) and achievement than between re- 
sponse time and IQ, therefore, suggest that 
Be relationship between response time an: 
De criterion measures was due not just to an 
rcd of speed, but also to intellectual 

y. 


Controlling for IQ, partial correlation 
coefficients between mean response time and 
achievement scores drop virtually to zero. 
When controlling for mean response time, 
partial correlation coefficients between 
Composite IQ and achievement remain sig- 
nificantly different from zero (r = .4-.5). 
This is true whether one controls for the 
mean response time for any one task or for 
all three tasks. Thus, about half of the 
variance common to intelligence and 
achievement, as measured here, is explained 
by a factor relating to mean response time. 

Multiple correlations for mean response 
time to all three tasks were not much higher 
than the zero-order correlations between 
mean response time for any one task and IQ 
or achievement (r = 5-7). This suggests 
that the relationship between response time 
and the dependent variables remains about 
dless of the conceptual task 
as long as the 


level, median response time for each level of 
complexity correlated 
(usually r = —,4-—.6) and to a somewhat 
lower degree with achievement (usually r = 
—,3-—.5). In none of the tasks did the cor- 
relations between median response time and 
the dependent. variable increase as a function 
of complexity level (see Table 2). This 
suggests that the relationship between re- 
sponse time and intelligence is not depen- 
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Table 2 


Pearson Product-Moment Correlations for Median Response Times (By Level of Complexity) 


With 1Q and Achievement 


Sentence-picture 
comparison* 


2 


Variable 


Composite IQ -56 -57 -58 -58 
Verbal IQ -48 -49 -50 -62 
Nonverbal IQ -56 -56 -56 -55 
Achievement 
Reading 
comprehension =44 — —48 -51 -51 
Math concepts -47 -43 -46 -43 


Math computation =45 — —47 


Note. 
* The numbers 1, 2, 3, and 4 indicate complexity level. 


dent on degree of complexity, as long as some 
minimal level of conceptual ability is in- 
volved in the task. 


Related Findings 


‘The mean percentage of correct responses 
under untimed conditions (when adminis- 
tered to a sample of 20 sixth-graders) was 
virtually 100%, and under timed experi- 
mental conditions was very high (about 90%) 
for all three tasks. This comparison 
suggests that incorrect responses under 
timed conditions were the result of pressure 
to respond as quickly as possible. 

"The correlation between mean response 
time and the number of correct responses 
ranged from 0 to —.28 among the tasks, but 
did not increase as a function of the levels of 
complexity within each task. It thus ap- 
pears that there is p trade-off between 
speed of perform 
tasks pe od pe ance for the 

o significant differences were found 
between male and female subj 
the variables examined ^ i mu 


Discussion 
A substantial relationship (r = —.6 
found between the speed of (Corin 
formation on three different tasks (as mea- 
sured by mean response time) and intelli- 


gence (as measured by a relatively untimed 
group intelligence test). If a timed intelli- 


Decimal points have been omitted from the data, N = 94. 
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Pictorial similarities 


and differences” Matrix analysis? 


2 3 


—42 -54 -55 -39 -47 -62 -44 -48 

-31 -45 -44 -35 -41 -52 -40 -38 

-46 -56 -66 -37 -47 -62 -42 -5 

-25 -38 -45 -33 -41 -45 -39 -3 

=—11 -31 -40 -17 -25 -37 -39 -2 
E —36 


-26 -37 -27 


gence test were used, the correlations with IQ 
would be higher. (In a supplementary study: 
correlating response time on the sentence- 
picture comparison task with Otis-Lennon 
Mental Ability Test scores, correlations of 
—.7-—.8 were obtained.) 

The results of this study also suggest tha 
the relationship (r = —.4-—.5) between mean 
response time and achievement in reading 
and mathematics is attributable to intel- 
lectual ability. However, there is consider- 
ably common variance between IQ and 
achievement that is not related to response 
time. This common variance may be due t9 
factors such as knowledge of a particula 
subject matter, reading ability, and so on. 

Since mean response time of the different 
tasks correlated to the same extent wi 
Nonverbal IQ as with Verbal IQ, regardle 
of whether the task contained verbal com 
ponents, it appears that what is being mea 
sured here is not primarily determined by 
the content of the items. This is reflected 
also in the moderately high intercorrelation 
of mean response time for the different taski 
(r = 6) and in the finding that the correla) 
tion of mean response time on a single t 
with IQ or achievement was just as high 
the multiple correlation of all three task 
with the same criteria. These results con: 
firm the work of earlier researchers Wit 
proposed the existence of a general spe& 
factor in intelligence (DuBois, 1995 
McFarland, 1930; Miyajima, 1972; Porebski 
1954, 1960), and demonstrate that speed 0 
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yrocessing information (as measured by 
nean response time) is a factor that gener- 
lites across very different conceptual 


asks. 

On the other hand, response time on these 
asks evidently reflects factors other than a 
eneral speed factor alone. This is seen in 
ihe finding that the reliabilities of mean re- 
sponse time on all the tasks were considera- 
bly higher than their intercorrelations. 

The relationship between speed of pro- 
essing information and intelligence does not 
spear to be dependent on the degree of 
gmplexity within the task. This is evident 
in the finding that despite the increasing 
«mplexity of levels within the tasks, median 
response time for correct responses in each 
kvel, in general, correlated to about the same 
degree with IQ and achievement. Hence, as 
jong as some minimal level of conceptual 
ability is involved, increasing the complexity 
of the task does not alter the relationship 
between speed of processing information (as 
teflected in response time) and intelli- 
gence. 

Slope of the response times did not add 
any new variance to the relationship between 
mean response time and IQ. The one reli- 
able measure of slope (that for the sen- 
lence-picture comparison task) was not as 
reliable as mean response time, and it cor- 
related to a similar extent as mean response 
pine with IQ and achievement. However, 
there are problems in obtaining a measure of 
‘lope: In order to evaluate the relationship 
of slope of response time to intelligence, it 
Way be necessary to devise tasks that contain 
more levels of complexity and greater in- 
ttements between levels. Yet the range of 
Complexity that one can sample in a response 
time measure—using items that everyone 
‘an perform—is necessarily limited by the 
abilities of the subjects. Hence it may not 
be possible, within the framework of mea- 
‘uring speed of processing information, to 
obtain a robust measure of slope. 
aL here are two alternative explanations of 
À * moderately high relation between speed 
processing information and intellectual 
ility. On the one hand, speed of pro- 
cessing information may be only one of the 
may characteristics of general intelligence; 
lligent individuals may just happen to 
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have faster processing times. On the other 
hand, speed of processing information may 
in itself be a factor that increases intelligence 
because it increases efficiency of functioning. 
Individuals who process information more 
quickly may be able to process more infor- 
mation than others in the same amount of 
time. Ina classroom, they will progress at 
a faster rate than other students as newer, 
more difficult material is learned; in a test 
situation (including many intelligence tests 
that reflect the amount of information pre- 
viously learned), they will demonstrate 
greater acquisition of knowledge because 
they have been able to process more infor- 
mation than others given the same amount 
of exposure time. Neither of these inter- 
pretations alone may suffice to explain the 
relationship between speed of processing 
information and intelligence. Rather, it is 
likely that both interpretations are valid, 
that is, that speed of processing information 
is both a correlate of and a direct contríbutor 
to intellectual functioning. In any case, the 
psychometric implications remain the same: 
Speed of processing information, as mea- 
sured by response time, may be used as a 
reliable estimator of intellectual ability. 

In summary, the present study found 
mean response time to be highly reliable for 
all the tasks (ru = .9) and to correlate about. 
as highly with the Lorge- Thorndike Intel- 
ligence Test (r = .6) as do other intelligence 
tests with each other (r = .5-.9). Although 
this study is limited to the types of subjects 
and types of tasks used, it does raise the 
possibility that speed of processing infor- 
mation (as measured by response time) can 
be used to supplement traditional tests of 
intelligence. At the present time, many 
schools prohibit the use of conventional 
tests, primarily because of their fear of cul- 
tural bias. A response time measure that 
correlates significantly with intelligence and 
relies far less on the acquisition of specific 
kinds of knowledge may be less subject to 
criticism of cultural bias. Given tasks in 
which all subjects can perform all the items, 
such a measure is “relatively neutral with 
respect to knowledge” (Goldberg et al., 1977, 
p. 14) and may prove to be useful in situa- 
tions in which conventional intelligence tests 
cannot be used. 
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1. Thorndike, R. L. Personal communication, March 
14, 1977. 
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Sixty student teachers and 137 practicing teachers rated the severity of class- 
room misbehaviors ascribed to either a black or white attractive or unattrac- 
tive child. Prior to student teaching, student teachers displayed no signifi- 
cant differences in ratings of black and white children. Following student 
teaching, ratings of black children's transgressions significantly increased in 
severity, and ratings of white children's transgressions remained the same. 
Practicing teachers were affected by student attractiveness but not by student 
race, with transgressions by attractive children of both races being rated more 
severely than transgressions by unattractive children. Practicing teachers 
with BA degrees rated transgressions more severely than did practicing teach- 
ers with graduate degrees, and all practicing teachers rated tantrum behavior 
more severely than classroom stealing behavior. 


| The systematic investigation of the ef- 

lects of physical features on human inter- 
‘actions is a relatively recent phenomenon 
(Berscheid & Walster, 1974). The majority 
of studies have been concerned with the 
physical attractiveness variable and with its 
telationship to opposite sex attractions 
(Walster, Aronson, Abrahams, & Rottmann, 
1966), to the attributions of personality traits 
by adults (Dion, Berscheid, & Walster, 1972) 
and children (Dion & Berscheid, 1974), and 
tothe judgments of severity of transgressions 
insocial settings like the courtroom (Efran, 
1974) and the classroom (Dion, 1972). In 
general, the findings support what Dion et 
al. (1972) labeled as the “what is beautiful is 
good” phenomenon. 

The subtle but powerful effects of stu- 
dents’ physical traits on teachers’ percep- 
tions and judgments are just beginning to be 
Understood. Clifford and Walster (1973), 
or example, gave teachers objective infor- 
mation about a child’s academic and social 
Potential along with a photograph of either 
an attractive or unattractive boy or girl. 
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Teachers were then asked to state their ex- 
pectations of the child's performance. At- 
tractive children of both sexes were judged 
to be more interested in school, more likely 
to succeed, more intelligent, more popular 
with classmates, and as having parents who 
were more interested in education. In a 
similar study, Dion (1972) presented infor- 
mation describing either a mild or severe 
transgression ostensibly committed by a 
T-year-old whose picture was included. 
Again, both the sex and physical attractive- 
ness of the children were varied. Subjects, 
female college students, were asked to eval- 
uate the child's potential for future misbe- 
haviors. When the transgressions were se- 
vere, more chronic antisocial behavioral 
dispositions were attributed to unattractive 
children. Transgressions were evaluated 
less severely when committed by an attrac- 
tive child. Interestingly, no differences in 
the intensity of advocated punishment were 
found. 3 

A more recent study in this area by Rich 
(1975) used experienced teachers as subjects 
and varied the sex and physical attractive- 
ness of photographs which were accompa- 
nied by a vignette describing a misbehavior 
“possibly committed” by the child. 
Teachers were also provided with a report 
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card characterizing the child as a good, sat- 
isfactory, or poor student. They were then 
asked to rate the pupil for blame, personal- 
ity, and punishment. The results, as ex- 
pected, were that attractive children re- 
ceived more desirable personality ratings. 
However, when the discrepancy of a negative 
report card was introduced, attractive chil- 
dren of both sexes were more readily blamed 
for severe misbehaviors. With regard to sex, 
Rich found that unattractive girls were 
evaluated more leniently than unattractive 
boys. 
Although a few studies have dealt with 
racial variables in the classroom and have 
generally confirmed the white teacher's bias 
against black children (Brown, 1968) and 
especially against gifted black children 
(Rubovits & Maehr, 1973), the relationship 
between race and physical attractiveness has 
received little attention. Cross and Cross 
(1971), studying the perception of facial 
beauty, found that subjects of both races, 
regardless of age or sex, tended to rate white 
faces more positively than black ones. 
However, whether this race-related standard 
of beauty influences the attributions and 
expectations of others, like teachers, is not. 
known, A study by Fraser and "Taylor 
(1974) does suggest that ethnic variables can 
be related to the attribution of personality 
traits and responsibility for behavior. These 
authors found that English and French Ca- 
nadian adults tended to make more favor- 
able attributions of motives to members of 
their own ethnic groups. Moreover, the at- 
tributions were found to be consistent with 
in-group stereotypes held by the other 
group. — 
Considering, therefore, that race may be 
a significant variable in the determination 
of physical attractiveness, it follows that race 
can also be a variable in the evaluation of 
motives and attribution of responsibility. In 
other words, if blacks are perceived as less 
attractive than whites (Cross & Cross, 1971), 
greater responsibility for negative outcomes 
and less desirable motives may be attributed 
to them. The operation of this bias in an 
educational setting is the subject of this 
study. It is hypothesized that the trans- 
gressions of unattractive children will be 
rated more severely than those of attractive 
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children and that black children will be | 
positively evaluated than white children d 
comparable attractiveness. Furthermore 
since previous studies used subjects ranging 
from college undergraduates to practicing 
teachers without any attention to years of 
experience or highest degree earned, it was 
decided to incorporate these as independent 
variables. It was hypothesized that antici: 
pated effects would accrue regardless of 
whether subjects were student teachers or 
practicing teachers of varying levels of ex. 
perience. 


Method 


Subjects 


Experienced teacher subjects were 137 practicing 
teachers from the greater St. Louis public schools, 
distribution of this sample by race, sex, years of expe: 
rience, and highest degree earned was 36 black and 10 
white; 36 male and 101 female; 33 with 0-5 years expe: 
rience, 66 with 6-15 years experience, and 35 with aboy 
15 years experience (3 participants left “years of expe 
rience” blank); and 56 with BA degrees, 56 with MA 
degrees, and 3 with PhD degrees (22 participants left 
“highest degree earned" blank). Student-teacher 
subjects were 54 white females and 6 white males from 
the Education Department of the University of Mis- 
souri—St. Louis. 


Procedure 


Practicing-teacher and student-teacher participani 
were administered the same instrument designed 10 
assess the severity of teachers’ reactions to students! 
transgressions. Practicing teachers were administered 
this instrument during professional development 
workshops. Student teachers were administered the 
instrument during orientation to student teaching an d 
again 16 weeks later at the conclusion of student 
teaching. The instrument consisted of a three-page 
booklet. The first page depicted a fabricated “incident 
report" professionally lettered to appear to be from d. 
school in Washington, D.C. It contained one of two 
incidents, either a tantrum thrown in class or 
Stealing of lunch money from a teacher's desk. In 
upper right-hand corner was a picture of a 10-year= 
fourth-grade, black or white male child previous 
judged to be either attractive or unattractive. The 
child's name, teacher's name, and recommendation were 
blacked out. ‘The second page consisted of 15 state- 
ments regarding the severity of the incident (“Com 
pared to everyday classroom behaviors, the misbehavior 
described is not serious.”), disciplinary action recom- 
mended (“I would normally recommend suspension 
from school for such an incident."), or character evaa 
uation (“Considering the situation, it is likely that th 
child will indulge in behavior similar to this in the f 
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ure"). Each statement was followed by a 5-point 
I Likert-type scale ranging from strongly agree to strongly 
ee, Of the 15 items, 6 were stated positively and 
gnegatively in order to minimize subjects’ response bias. 
Page 3 consisted of the Pupil Control Ideology Scale, a 
teacher authoritarianism scale developed by Willower, 
fidell, and Hoy (197 3). This was included as a measure 
i concurrent validity for the 15-item scale. 

Stimulus photographs were selected by having four 

shite male and four white female graduate students rate 
ló photographs of black male children and 14 photo- 
graphs of white male children on a scale ranging from 
(most attractive) to 7 (least attractive). All photo- 
graphs were of the waist and above. A standard se- 
quence of presentation of the 29 photographs was based 
ypon a randomization of photographs without regard 
trace. To discourage a conservative clustering of 
tings around the mean rating, raters were encouraged 
employ the full numerical 7-point range allowed by 
‘hescale. An analysis of variance to estimate reliability 
of measurement (Winer, 1962) produced a highly sig- 
nificant interrater reliability (r = .96). The distribution 
s ratings for black and white children was approxi- 
nately equal, the ratings of each sample being slightly 
skewed toward the attractive pole. The average rating 
ifthe most attractive black child and most attractive 
white child was identical (2.61). "The average ratings 
of the least attractive black child and least attractive 
white child were approximately equal, 6.66 and 6.00, 
tespectively. "These photographs were selected as test 
stimuli, 
Classroom incidents were obtained by having four 
teachers and 14 graduate students rank-order 15 
transgressions according to severity and selecting those 
(wo consistently receiving middle ratings. 


Results 


The student teachers’ data were analyzed 
ising a 2 x 2 X 2 (Student Race X Student 
Attractiveness X Preteaching/Postteaching) 
analysis of variance with the last variable 
treated as a repeated measure. The prac- 
licing teachers’ data were analyzed by five 
three-factor analyses of variance, each con- 
laining student race and student attrac- 
tiveness with the third factor being either 
teachers’ race, sex, highest degree earned 
(BA, MA, PhD), years experience (0-5, 6-10, 
0-n), or story (stealing, tantrum). In all 
analyses, the dependent measure was the 
‘otal of the teacher and student-teacher 
'atings of the 15 items with directionality 
pene and negative valences) taken into 
‘count. Total score was chosen after per- 
orming a factor analysis which produced 
our undefinable clusters and after rejecting 
alternative procedure of an unwieldly 


tries of analyses by separate item. 
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Table 1 
Mean Ratings of Severity of Transgressions of 
Black and White Students Before and After 


Student Teaching 


Time 
Student race Before After 
Black 40.74 44,00 
White 41.12 40.64 


Analysis of the student-teacher data re- 
sulted in a significant Student Race X Pre- 
teaching/Postteaching interaction (Table 1, 
F(1, 56) = 6.94, p = .01, with student 
teachers rating black children's transgres- 
sions significantly more severely following 
their student-teaching experience than be- 
fore it and significantly more severely than 
their preteaching or postteaching ratings of 
white children, with no significant differ- 
ences occurring between black pre- and 
white pre- and postteaching means (t = 3.52, 
F'9 = 3.23; Scheffé, 1953). 

Analyses of practicing teachers' data re- 
sulted in significant main effects for student 
attractiveness, F(1, 136) = 4.39, p € .05, with 
transgressions by attractive children re- 
ceiving more severe ratings; for teachers" 
highest degree earned, F(1, 118) = 475, p € 
05, with more severe ratings being produced 
by teachers with BA degrees than by those 
with graduate degrees; and for story, F(1, 
136) = 7.31, p € 01, with the story of a child 
throwing a tantrum being rated more se- 
verely than the story of a child caught 
stealing. No significant interactions were 
obtained. 

Because of the unexpected impact. of the 
story variable for practicing teachers, the 
student-teacher data were reanalyzed in- 
corporating this variable. No significant 
effects were obtained. 

The correlation between the dependent 
measure and the Pupil Control Ideology 
Scale measure was significant (r = .42,p < 
01). Scores on the dependent measure were 
more normally distributed than were scores 
on the Pupil Control Ideology measure. 
This provided assurance that the self-de- 
vised 15-item scale was yielding, to some 
extent, a measure of teacher authoritarian- 
ism as anticipated. 
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Discussion 
Results of the analyses of student-teacher 
and practicing-teacher data failed to support 
the hypotheses that transgressions of unat- 
tractive children are rated more severely 
than those of attractive children, that black 
children are always less positively evaluated 
than white children of comparable attrac- 
tiveness, and that these relationships obtain 
with both student-teacher and practicing- 
teacher populations. Instead, different 
evaluation patterns occurred for the two 
teacher populations, and in neither of these 
populations was the hypothesized Student 
Race X Student Attractiveness interaction 
manifest. 
The current sample of student teachers 
was affected by students’ race but not by 
students’ attractiveness. While no racial 
bias was exhibited prior to student teaching, 
post-student-teaching ratings showed a 
significant increase in judging the severity 
of black children's transgressions, with no 
comparable changes occurring in the ratings 
of white children, In an attempt to account 
for the development of this apparent nega- 
tive bias toward black children, it may be 
worthy to note that the schools in which 
these student teachers were placed were 
historically white but currently in transition 
toward racial balance. Perhaps frustration 
encountered in the apprenticeship situation 
of student teaching was projected onto a 
population already targeted as disrupting 
the status quo. An alternate explanation is 
that these student teachers modeled the bias 
of. Supervising teachers. However, this is less 
feasible, since the data on practicing teachers 
were compiled in the same school district 
and demonstrate that if these practicing 
teachers are biased at all, it is by students’ 
attractiveness and not by students’ race. 
; The practicing teachers' data present an 
interesting comparison with that of the 
student teachers’. As mentioned, student 
attractiveness appears to be an important 
influencing variable for this population, 
while student race is not. These data, in 
general, suggest that teachers with true 
classroom responsibility (as opposed to ap- 
prenticeship status) are least tolerant of 
misbehaviors by attractive children, re- 
gardless of race. Although these findings 
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initially appear to refute the "what is beau. 
tiful is good" hypothesis, they may, in actu- 
ality, lend support. If it is true that what (or 
who) is attractive is generally judged to be 
“good,” then misbehaviors by attractive 
children would present the greatest possible 
discrepancy between what is expected and 
what is observed, and therefore would 
present the greatest threat both to a teach- 
er's self-esteem and to classroom decorum. 
The result would be more severe action to 
reduce that threat. "These results are anal- 
ogous to findings by Rich (1975), who foi 
that misbehaviors of attractive children with 
negative report cards (discrepancy) wi 
rated more severely than those of attracti 
children with positive report cards. The 
contrast of these findings with those of stu 
dent teachers is interesting. Research is 
currently under way to assess the reliabili 
of these differences and to determini 
whether the current population of student 
teachers will shift their bias from that of 
student race to one of student attractiveness 
as they themselves shift from student 
teachers to practicing teachers. 

The reported relationship between stu- 
dent attractiveness and the ratings of 
transgressions made by practicing teachers 
was obtained regardless of the teachers 
highest degree earned and regardless of the 
number of years of teaching experience. 
However, a significant main effect of highest 
degree earned was obtained and is inter 
esting despite its unrelatedness to either the 
student race or student attractiveness vali 
able. Ifthe 15-item scale which provides 
dependent measure does in fact meas 
authoritarianism, then these findim 
suggest that teachers holding the BA degree 
are more authoritarian in their approach to 
classroom discipline and potentially less 
tolerant of student misbehaviors than are 
their graduate-degree-holding colleagues 
Explanations for this are only speculative 
this point. These findings may reflect per- 
sonality trait differences or they may refl 
the outcome of graduate level education. 
is clear, however, that they do not re 
years of teaching experience. The corri 
tion between highest degree earned d 
number of years of teaching experience 
negligible. 
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(a) described a prospective college course, (b) asked them to rate their interest 


Ninety-four undergraduates completed one of two questionnaire forms that 


in the course, and (c) had them estimate four behavioral and attitudinal vari- 
ables, assuming future enrollment. The questionnaires were identical except 
for the grading system (pass-fail versus standard letter grades) to be used in 
the course. The “overjustification hypothesis” proposes that grades under- 
mine high intrinsic interest but augment low interest in academic tasks. Pre- 
sumably, high interest would be undermined to a lesser degree with pass-fail 
than with letter grades. Although the results are partially consistent with this 
position, serious questions are raised concerning the application of overjustifi- 


cation suppositions in classrooms. 


Evaluation of academic performance via 
standard letter grades (A, B, C, D, and F) has 
generated controversy for many years. 
Arguing that grades distract the student 
from learning, encourage competition and 
cheating, and stifle creativity, some educa- 
tors have called for reform. One approach 
in deemphasizing grades that has been fa- 
vorably received in elementary schools is the 
nongraded school concept in which ad- 
Vancement is usually determined by the 
student's reading level (Hillson, 1967). The 
results of studies comparing the effects of 
graded and nongraded approaches on aca- 
demic achievement have been mixed. Some 
researchers have found higher achievement 
by nongraded students (e.g., Skapski, 1960), 
while others have found superior achieve- 
ment by graded students (e.g., Carbone, 
1961). Another approach in deemphasizing 
grades, the pass-fail system, has been rec- 
ommended by other educators (Kirschen- 

baum, Napier, & Simon, 1971). But recent 
studies have shown that undergraduates’ 
academic achievement is lower under pass— 
fail grading than under Standard grading 
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(Bain, Hales, & Rand, 1973; Gold, Reil 
Silberman, & Lehr, 1971). 


Recent theoretical and experimental work: 
have suggested that the provision of extrin ic 
rewards, such as grades, may reduce one’s 
intrinsic motivation for the rewarded task: 
This finding, called the “overjustification 
effect,” has been explained principally by the 
“overjustification hypothesis" (Lep| 
Greene, & Nisbett, 1973), which derives from 
Bem's (1967) theory of self-perception. | 

short, the overjustification hypothesis pi 
poses that the provision of external reward 
Serves to undermine one's intrinsic interes 
by suggesting to the individual that tht 
otherwise enjoyable task is not being per- 
formed because it is either inherently. in 
teresting or an end in itself; rather, the per 
son assumes something like “If I must be 
rewarded for this activity, I must not be very 
interested.” 
_ Several studies have reported an ovet- 
justification effect when tangible rewards 
Were made contingent upon play activity: 
For example, Deci (1971, 1972a) found that 
college students expecting contingent pay, 
ment spent less time doing puzzles than dio 
control subjects who were not paid. Siml i 
larly, Lepper et al. (1973) reported th 
young children who were previously 
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„arded with a “Good Player” certificate for 
rawing pictures subsequently spent less 
are time engaged in this activity. But the 
ue is not as clear-cut as such studies 
ggest. Other researchers have found the 
ipposite effect, that is, that reward enhances 
jhildren’s subsequent interest in play ac- 
livities (Feingold & Mahoney, 1975; Sarafino 
I. Stinger, Note 1). 
Generalization of the overjustification 
effect to the issue of academic grades has 
leen prompt (Deci, 1972b; Greene & Lepper, 
1974). The argument is that if a student is 
iready interested in an academic task, 
gades will undermine this interest. On the 
ither hand, rewards may be needed if “the 
vel of initial intrinsic interest in the activity 
is very low and some extrinsic device is es- 
ential for producing involvement with the 
wtivity” (Lepper et al., 1973, p. 136). Thus, 
in important assumption in the overjustifi- 
tation hypothesis is that rewards reduce high 
interest but increase low interest. Although 
this postulate is empirically verifiable, only 
one study (Calder & Staw, 1975b) has ex- 
amined the interaction between the subject’s 
interest in a task and the provision of ex- 
trinsic reward. Even though their task, 
Igsaw puzzles that either formed “inter- 
tsting pictures” or were blank, were not 
tated as especially “enjoyable” by the college 
student subjects (the highest group mean for 
the puzzles with pictures was +2.3 out of a 
Possible 8 positive points), the form of the 
Interaction between intrinsic and extrinsic 
factors was consistent with the Lepper et al. 
(1973) assumption. "These results, if ex- 
tended into the academic setting, support 
the view that grades may oppositely affect 
igh and low initial interest. 
Although proponents for the generality of 
the overjustification hypothesis to education 
ve not made specific predictions con- 
ceming the relative effects of different 
stading systems, the writings of Deci and of 
Evene and Lepper imply that intrinsic in- 
test would be undermined to a lesser de- 
tee with a pass. fail system than with stan- 
letter grades. The importance of doing 
mona research specifically concerned 
ia the influence of grades on motivation 
atj cated by the fact that the overjustifi- 
‘on effect is not found with all reinforcers. 
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For example, Deci (1971) reported that 
verbal praise increased college students' in- 
terest in doing a puzzle. The question thus 
becomes: Do grades produce overjustifica- 
tion, or do they enhance motivation as praise 
does? As yet, there are no studies demon- 
strating that grades may overjustify an aca- 
demic task. 

The present investigation is designed to 
examine the influence on intrinsic motiva- 
tion of two factors: (a) pass-fail versus let- 
ter grading systems and (b) initial level of 
interest. On the basis of the overjustifica- 
tion hypothesis, an interaction between 
these factors is expected. That is, among 
students for whom initial interest in a course 
is high, those who are to be evaluated with 
letter grades should show less motivation 
toward the course than those with pass-fail 
grades. However, the converse should be 
found among students whose initial interest 
is low. 


Method 


Subjects 


The subjects were 94 male and female undergradu- 
ates enrolled in an upper division personnel psychology 
course at Rutgers University. 


Materials and Procedure 


The subjects were tested as a group during a regular 
class meeting using a straightforward questionnaire 
procedure, The first page of a two-page questionnaire 
described a fictitious prospective course, which the 
students were led to believe was authentic, The content 
of the course, entitled “Sociology and the Casinos,” was 
expected to be of interest to many students for two 
reasons: (a) a pilot survey of New Jersey college stu- 
dents suggested this would be the case, and (b) the 
content was topical because New Jersey voters had re- 
cently voted to approve the establishment of state- 
regulated casinos in a highly controversial referendum. 
Immediately below the course description, the subjects 
indicated their interest in it on an 8-point scale from 0 
(labeled “not at all interesting”) to 7 (“very inter- 
esting”). The first page was identical for all subjects. 

The second page had the students predict four of 
their own behaviors or attitudes on the assumption that 
they were to enroll in the prospective course. The in- 
structions at the top of this page provided the experi- 
mental manipulation regarding the grading system that 
would prevail in the course. Approximately half of the 
students were instructed that they would be graded 
“pass-fail”; the remaining subjects were informed that 
grading would be “standard A, B, C, etc., with no 
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fail option.” Everything else was identical for all 
Tiaa, Inaba ing the rationale for the prospective 
grading system: “Because the course is brand new and 
‘experimental,’ the department has decided ...." In 
order of tion, the four behavioral or attitudinal 
variables that the students were asked to predict were 
(1) amount of studying, (2) creativity if a paper were 
assigned, (3) class attendance, and (4) personal satis- 
faction in the course. ‘The students were asked to es- 
timate these variables for the prospective course as 
compared against all other courses previously taken. 
‘Their predictions were made, once again, on an 8-point 
scale from 0 (labeled "much less") to 7 ("much more"). 
It should be noted that Questions 1 (studying) and 3 
(attendance), being more concrete and quantitative, 
were expected to be the most sensitive of the four. The 


personal satisfaction was included be- 
cause Calder and Staw (1975a) argued that “reported 
tank satisfaction" may be an important indication of 

interest. 
‘The procedure was as follows: The students’ in- 
introduced the first author as "Dr. 


Results and Discussion 


For each second-page prediction (i.e., 
studying, creativity, attendance, and satis: 


Table 1 
Total Distribution of Subjects on the Basis of 


Grading Condition and Interest Rating 
—— 


Grading condition 7 6 5 4 


Pass-fail 7 13 
Standard grades 10 15 H H 


*& total at each 


162 267 305 162 105 


Note T = very interesting and O = not at all interesting. 
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Figure 1. Mean predicted studying under 
and standard letter-grades conditions as a 
rated interest in a college course. e 
predicted studying were estimated by the subject 
an B-point scale from 0 [low] to 7 [high].) 


faction) the subjects’ ratings were orgi 
in a2 X 8 matrix according to the 2 
conditions and the 8 initial interest. 
Of the 105 questionnaires returned, 
respondents indicated an interest rating 
3 or less, and their ratings were spar 
distributed over those 8 cells. 
quently, their data were not analyzed. 1 
left 94 subjects in a 2 X 4 matrix using! 
four highest interest ratings. The dis 
bution of the original sample by grad 
conditions and interest ratings in the p 
spective course is given in Table 1. 
Separate 2 X 4 analyses of variance Wt 
performed on the subjects’ predictions fü 
each behavior or attitude. The analysis 
the predicted-studying variable revea 
significant interaction of grading condi 
and interest rating, F(3, 86) = 3.70, p «4 
shown in Figure 1. The graph depi 
greater predicted studying by students in 
pass-fail condition if their initial interests 
very high (rating = 7). However, at then 
interest level, that is, a rating of 6, jT 
ing effect is reversed; students in the li 
grades condition predicted more stud 
Post hoc Newman-Keuls comparisons 
firmed the differences between gradi 
conditions at both the 6 and 7 interest le! 
(p < .05). The main effects approac 
significance for the factors of grading, 
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able 2 MU 
jeans and Standard Deviations for the Variables of Studying, Creativity, Attendance, and 
sonal, Satisfaction as a Function of Initial Interest Rating and Grading Condition 


Rated interest 

Measure and 7 6 5 4 

grading 

condition M SD M SD M SD M SD 
Studying 

Standard 3.60 1.36 447 an 3.93 1.03 344 BA 

Pass-fail 4.57 .50. 3.46 1.15 3.50 69 3.13 60 
Creativity 

Standard 5.40 111 5.07 16 471 E 444 1.08 

Pass-fail 4.86 82 4.38 140 4.72 EU 4.25 7 
Attendance 

Standard 5.80 87 4.93 1.00 443 1.35 4.00 1.15 

Pass—fail 543 147 4.85 1.09 4.67 EJ! 4.00 1.00 
Satisfaction 

Standard 5.40 111 547 87 457 63 4n 1.53 

Pass-fail 5.57 4 5.08 1.06 4.18 1.22 4.12 EL 


) = 3.16, p <.10, and initial interest, F(3, 
) = 2.40, p < .10. 
The analyses of variance for the three 
E" questions revealed only one reliable 
lect; predicted attendance declined as a 
function of interest level, F(3, 86) = 6.91, p 
01. No interaction (F < 1.0) was found 
inthe predicted-attendance data. Since the 
mriables of creativity and personal satis- 
lection are vague and abstract, it is not sur- 
prising that they produced no reliable ef- 
fts, The means and standard deviations 
Es = dependent variables are given in 
le 2. 
The most important findings in the 
Present investigation are within the inter- 
xtion of grading condition with initial in- 
terest, revealed in the predicted-studying 
variable, This interaction is consistent with 
the hypothesis of Lepper et al. (1973) that 
effect of extrinsic rewards is to under- 
hine task motivation in interested individ- 
tals while augmenting motivation by those 
interest is low. However, the present 
ta also suggest that the incidence of aca- 
mic interest high enough to be under- 
mined may be rather limited. Of the total 
‘iginal sample, only about 16% were suffi- 
y interested. Obviously, this per- 
"ntage must be regarded with caution. The 


e. The scale ranged from 7 to 0, where 7 was "very interesting" and 0 was “not at all interesting." 


incidence may have been affected by the 
course description, the hypothetical nature 
of the survey, the particular variables being 
measured, and by the fact that the task (i.e., 
the course) is relatively broad compared to 
the more specific activities (e.g. drawing 
pictures or doing puzzles) through which the 
overjustification effect has typically been 
studied. Even with these cautions in mind, 
these data suggest that the large majority of 
students may benefit from the use of grades. 
For these students, grades motivate more 
effect corresponds to the 
outcome of recent research (Bain et al., 1973; 
Gold et al., 1971) showing lower achi t 

pass-fail grading than 


portant. If grades deemp 
for students possessing high initial interest 
in a task, it is imperative that their interest 
be accurately The margin for error 
may be small, and the educator runs the risk 
of “ ifying" the task for a larger 
proportion of students. Assuming that ac- 
curate assessment can be done, practical 
considerations yield additional problems. 
For example, students who elect to take a 
“very interesting” course may be, in fact, 
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highly interested in some topics within the 
course but not others. 

The present data also indicate that the 
specific measure of motivation is pivotal for 
revealing an overjustification effect. While 
the predicted-studying measure showed an 
effect consistent with an overjustification 
interpretation, predicted attendance did not. 
It is unlikely that this difference can be ac- 
counted for on the basis of measure sensi- 
tivity, since attendance was quite sensitive 
to levels of initial interest. It is more likely 
that the grading factor affected these two 
measures differently. This possibility is 
especially interesting because one could 
argue that attendance, being less directly 
conjoined with grades, more validly reflects 

intrinsic interest. If this is so, then interest 
per se is unaffected by grading methods; but 
protask behaviors that determine grades are 
affected. It is possible that the combination 
of standard grades and high interest arouses 
a high level of other motivational compo- 
nents, for example, anxiety, which may de- 
press associated protask behavior. Thus, 
high anxiety might produce avoidance for 
studying because the behavior is closely as- 
sociated with grades, but attendance would 
be relatively unaffected. The issue of valid 
definition and measurement of “interest” 
has been raised by other researchers (Calder 
& Staw, 1975a, 1975b; Feingold & Mahoney, 
1975), but the problem remains unresolved. 
Perhaps this issue is irrelevant to the most 
basic goal of education, namely, to maximize 
learning. It may be that applied educational 
researchers should simply consider which 
behaviors (e.g., studying and attendance) are 
the most crucial for learning and determine 
how they are affected by both grades and 
expressed initial interest. 

In summary, to the authors’ knowledge, 
this is the first study of the overjustification 
hypothesis using grades as the reward in an 
academic task and setting. Although the 
results are partially consistent with the no- 
tion that grades may reduce high intrinsic 
interest, conclusions must be tempered by 
the cautions and questions raised by these 
data. For example, what is the most valid 
measure of academic interest? If atten- 
dance is a more valid measure than studying, 


EDWARD P. SARAFINO AND PATRICK A. DIMATTIA s 


then the overjustification hypothesis is 
confirmed, and other motivational expla: 
nations- become more plausible for thi 
data. Furthermore, even though thes 
was conducted realistically and convinei 
what students predict they would do in 
hypothetical situation may be different: 
their actual behavior. Considering such 
issues and those presented by studies dem: 
onstrating different effects of specific rew- 
ards (Deci, 1971; Sarafino & Stinger, Note 1) 
on protask behavior, much more educatii 
research will be needed to determine if 
how the overjustification hypothesis sh 

be applied in classrooms. 


Reference Note 
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dren's performance and intrinsic interest with ex 
trinsic reward. Manuscript in preparation, 1978, 
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Selective Imitation of 
in a Prescho 


Complex Sentences 
ol Setting 


Gary Novak _ 
California State College, Stanislaus 


An experiment was conducted to determi 
tive imitation of relative clauses could be 
preschoolers took part in a 2 (modeling, 
no instructions) X 2 (reinforcement, no r 


ine the conditions under which selec- 
produced in preschool. Seventy-two 
imitation training) X 2 (instructions, 
einforcement) X 3 (baseline, Probe 1, 


Probe 2) factorial design. Imitation training was more effective than mod- 
eling alone, and instructions and reinforcement were effective in an additive 
manner. Relative clause usage increased significantly after imitation training 


but not after modeling or no treatment. 


"The results are discussed in terms of 


the efficiency of various teacher techniques in increasing language usage. Im- 


plications for the role of imitation in 
cussed, 


Imitation is considered a common pro- 
cess by which children acquire language. 
However, the usefulness of imitation in 
generating novel language forms has been 
questioned (Dale, 1972; Ervin, 1964) on the 
basis that imitations of adult speech by 
children are demonstrably often less, and 
rarely more, grammatically advanced than 
the children’s own spontaneous speech. 
Further, these critics attack the role of imi- 
tation from a logical standpoint: A proce- 
dure that results in mere replication of 
grammatical patterns cannot also result in 
generative progressive development (e.g. 
Ervin, 1964). 

These criticisms have been largely dis- 
pelled by the notion of “selective imitation” 
(e.g., Whitehurst & Novak, 1973; Whitehurst 
& Vasta, 1975), which illustrates how lan- 
guage can be imitative on some aspects (e.g., 
syntactic) yet be simultaneously novel on 


‘This study is based upon a dissertation submit 
the State University of New York at Stony Boe fe 
partial fulfillment of the requirements for the doctoral 
degree. The author expresses his appreciation to 
Grover J. Whitehurst for constant support in this 
project, to Robert M. Liebert, Leonard Krasner, and 
Frank Anshen for serving as members of his dissertation 
committee, and to Nina Toscano, Alan Rowland, Bert 
Motyoma, and Sue Dellefield for serving as experi- 
menters. 
Requests for reprints should be sent toG; 
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Stanislaus, Turlock, California 95380. à 
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language development are also dis- 


others (e.g. semantic). Since studies illus- 
trating selective imitation have been con- 
ducted only with individual subjects in lab- 
oratory settings, questions as to the validity 
of the selective imitation approach to the 
classroom teaching situation remain. In 
addition, the specific features of the class- 
room teaching situation which facilitate se- 
lective imitation of language production 
need to be explored. 

In the present investigation an experiment 
was conducted to determine the conditions 
under which selective imitation of a low- 
frequency syntactic structure (relative 
clauses) could be obtained in a classroom] 
setting. Cazden (1964) had looked at the 
role of modeling and modeling with expat) 
sions in enhancing syntactic complexity of 
children’s language in a Head Start class- 
room. However, Cazden failed to control 
the syntax used by the adults. Conse- 
quently, Cazden was unable to determine 
whether any syntactic imitation occurred. 
In the present study, the relative clause was 
selected as the target structure, since it 0 
curred with low frequency in the language 
the preschool children. Since the relative 
clause also occurs at a low rate in adu 
speech, extensive modeling of this structuré 
could be essentially restricted to the teacher 
storytelling condition in this study. | 

A storytelling situation was used as a bast | 
for modeling and imitation training proc { 


x 


s. Storytelling was selected because it 
equent school activity which provides 
ctured situation for both modeling and 
jtation training procedures. Modeling 


cher reads the story, essentially modeling 
articular sentences, and the child listens 


ough some children may echo the story), 
no feedback is provided. Imitation 
ing, in contrast, requires the child to 
a sentence modeled by the teacher, 
dback is provided, and the child is re- 
d to repeat the utterance until the im- 
lion is accurate. This procedure is more 
me consuming and laborious but is a true 
hing procedure. 


Method 


Seventy-two preschool children ranging in age from 
months to 62 months served as subjects. Means and 

dard deviations for each cell are shown in Table 1. 
thildren were enrolled in one of three preschools 
ited in Stanislaus County, California. The class- 
teacher selected four subjects at a time, and this 
Ip was assigned to a treatment by use of a randomly 
erated schedule. 


Dparatus 


Five different comic books served as the stimulus 
tials, ensuring varied semantic content in each 
- The Flintstones was used for baseline, Chip 
Dale and Super Goof for Treatments 1 and 2, and 
nis the Menace and Hot Stuff for Probes 1 and 2. 
ftoups received the same sequence of books. Ses- 

Were held in a room inside each preschool in which 


ble 1 


SELECTIVE IMITATION 


"n Untransformed Percent of Relative Clause Usage 
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regular storytelling sessions had been held. The author 
revised the Treatment 1 and 2 comic book stories so as 
to make up scripts of prepared utterances for the 
teacher to read during treatment phases. All of the 
sentences in the revised stories were prepared utter- 
ances, and each prepared utterance was a relative clause 
sentence. A relative clause sentence was defined as an 

utterance containing two predications, one of which was 
a dependent formal predication (i.e., containing a verb) 

in which an implicit or explicit relative pronoun (who, 

which, that) or an adverb (when, where, why) referred 

to the predicational unit on which the clause was 

grammatically dependent (Hathaway, 1967). An ex- 

ample of a relative clause complex sentence is, “John, 

who is fishing, is late for supper." 

"The study was conducted by five individuals. The 
teacher-model was a female undergraduate who had had 
previous experience with teaching young children. Two 
male undergraduates, one female undergraduate, and 
the present author served as the four experimenters. 
The teacher and experimenters received training on 
storytelling procedures and on reliable identification 
of relative clauses prior to the start of the experiment. 
Four Sony TC56 tape recorders with condenser micro- 
phones were placed by the experimenters between them 
and their subject for baseline and child storytelling. 


Design 


The experiment was designed as a 2 (modeling, imi- 
tation training) X 2 (instructions, no instructions) X 2 
(reinforcement, no reinforcement) X 3 (baseline, Probe 
1, Probe 2) factorial design, with 8 subjects per cell. 
Groups of children received two teacher treatment. 
phases, differing only in terms of the comics read, and 
two probe conditions, which also differed only with re- 
spect to the comics read. An additional single control 
group of 8 subjects was run. This group received only 
the baseline and probe trials. The control group had 
contact with the teacher but no teacher storytelling 
treatment. 


Procedure 


The experiment consisted of five successive condi- 
tions: child storytelling ~ baseline, teacher storytelling 


Age in months 
bean eee a 


EU | Treatment M SD Baseline Probel ^ Probe2 
m 48.63 6.30 0 b j 
icing * instructions 52.63 6.09 £ e 15 

eling + reinforcement 45.13 7.81 0 3.15 2:25 
Kleine + instructions + reinforcement 52.63 9.47 0 ‘ 2.95 
ning 4863. 109  — 0 25 15 
ining + instructions 49.75 E a dne 3.00 
hin i reinforcement 49.86 re 155 9.25 12.5 
aning + j i i 1.63 - it : i 
into] instructions + reinforcement ur 715 0 0 


^" 
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- ‘Treatment 1, child storytelling ~ Probe 1, teacher 
storytelling - Treatment 2, and child storytelling ~ 
Probe 2. Groups of four children were run 5 
with each child in the group receiving the same condi- 
tions, A session is described below: 
Child storytelling - baseline. Four children were 
their 


happening next?” The ex 
prompts, The condition 
experimenter said, "That was really good, 
to sit in the center of the room in front of the 
storytelling ~ Treatment 1. During this 
teacher-model-mediated variables of 


imitation The 
ee 
who opened 


i 
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sentence, the model sentence repeated 
teacher, W the cid did pot lata rorectiy. cn ode 
ditional attempt was made. Each was called 


The purpose of this 
session was to auem the effects of the teacher- 

soryullag on the children’s vacui. The merat 
whether selective imitation has 
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back to the corners of the room and, using novel 
books, asked the children to "tell the story of this be 
for 3 minutes. ‘This condition followed the format 
the baseline. One of four experimenter-medi 
treatments was used in each group. All conditions 
‘as above except as noted, 

Reinforcement. The experimenter praised | 

subject whenever the child said a relative clause) 
tence. 
Instructions. Ifa child failed to use a relative dat 
in his first utterance, the experimenter said, “No, Tt 
you to tell the story the way the teacher did,” 1 
instruction was repeated if the child failed to usea 
ative clause sentence in three consecutive ut 

Reinforcement plus instructions. This e 
combined both reinforcement and instructions, #0 
if the subject used a relative clause sentence, he) 
praised. If the child failed to use a relative d 
tence, he was instructed as above. 

No reinforcement, no instructions, The 
menter followed the procedures for baseline, — — 
Conditions were kept short to prevent the child 
from becoming bored and frustrated. In order 
crease the effectiveness of the conditions, both child 
teacher storytelling were repeated. 

Teacher storytelling — Treatment 2. The ex 
menter again had the children sit in front of thet 
A new comic was read using the same condition th 
group received in ‘Treatment 1. 

Child storytelling - Probe 2. Returning to thet 
storytelling corners, each child told about anew! pi 
toan experimenter for 3 additional minutes. 


scored only if all parts of the sentence were comp 
One interscorer reliability check was performed o 
experimenter in each condition. The transcript 
scored independently by two trained graduate st 
who counted relative clause complex sentena 
mean length of utterances. ‘The author used a rañ 
numbers table to randomly select a subject inf 
condition to be scored for interrater relial 
independent scorer. 


Results 
Reliability 


Interrater reliability estimates of # 
scription accuracy were obtained for @ 
experimenter by the agreemen 
ment-plus-disagreement method. 

endent scorer transcribed ; 
session by each experimenter for each 
dition. 

Two measures of reliability were SC 
length of utterance and usage 
clauses. Agreement for length was € 
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s identical number of words per sentence. 
yreement on relative clauses was defined 
s occurrence of relative clause per sentence. 
fhe mean reliability was .87 for length and 
Vi for relative clauses. 


iecuracy of Elicited Imitation 


Asa check on the effectiveness of the im- 

ution training, one of the experimenters sat 
ind the children and scored their elicited 
jtations for accuracy. The observer used 
same criteria for accuracy as used by the 
her. 
The percentage of correct imitations on 
first trial was 40%. The mean number 
trials for accurate reproduction was 1.52, 
ith 18% of the trials resulting in failure to 
itate accurately after three attempted 
itations. Refusals to imitate also oc- 
d on 9% of all possible sentences. 


lative Clause Usage 


A one-way analysis of variance was com- 
to test for differences between the 
ups on baseline usage of relative clauses. 
differences proved to be nonsignificant, 
11,56) = .17. 
Mean percentage of relative clause usage 
caleulated, and the results appear in 
able 1, Since there were a large number of 
ren emitting no clauses, the distribution 
positively skewed. An arc sin trans- 
tion, arc sin X/(N + 1) + arcsin (X + 
IN + 1) = X,, was performed to 
sample distribution. A 2 (treatment) X 
linstructions) X 2 (reinforcement) X 3 
) analysis of variance was performed 
the transformed scores. The results of 
analysis showed that all main effects 
significant at or beyond the .05 level. 
main effects for treatments and rein- 
ment were significant at the .01 level. 
None of the interactions were found to be 
"nificant at the .05 level. 
mitation training plus modeling was 
tone to be more effective than modeling 
(M; = 3.32292 vs. Mm = .68750). In- 
ions were found to be significantly 
Sere effective than no instructions (My = 
ana vs. M1 7.098958). Reinforcement 
more effective than no reinforcement 


" 
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(Mg = 3.32292 vs. Myr  .68750). The 
Newman-Keuls procedure was used to test 
the differences between the phase means. 
Significant differences were found between 
baseline and Probe 1 (p < .05) and baseline 
and Probe 2 (p < .05). The small increase 
from Probe 1 (My; = 2.64063) to Probe 2 
(Mp2 = 2.78125) was not significant. The 
control group emitted no relative clauses 
across the three phases, ruling out warm-up 
or familiarity with the procedure as a caus- 
ative variable. A separate 3 (treatment) X 
3 (phase) analysis of variance was calculated, 
and the results showed no significant main 
effects. However, a significant Treatment 
X Phase interaction was obtained for the 
transformed data, F(4, 42) = 2.77, p € .05. 

A significant main effect indicated that an 
increase in relative clause usage was due to 
a superiority of training over modeling. In 
addition, instructions and reinforcement 
were also effective. These effects occurred 
on the first probe session and were main- 
tained, but not increased significantly, in 
Probe 2. The increase was not due to mere 
exposure to the comic books or a warm-up on 
the baseline and probes, since a no-modeling 
~ no-training control group did not increase 
clause ` 

Although small in magnitude, the effects 
of training, reinforcement, and instructions 
can be considered additive rather than 
multiplicative. Maximum effect would be 
achieved with training plus instructions plus 
reinforcement. 


Discussion 
"The results indicated that the frequency 
structure, the 


cond 
wered the question of the condi- 
tions under which selective imitation is likely 
tooccur: Imitation training plus modeling 
was more effective than simple modeling. It 
appears that the overt responding and im- 
mediate feedback of imitation training pro- 


a vis imitation training plus mod f 
producing selective imitation of the relative 


though he does not comprehend the f 
"The role of imitation training may be tois" 
crease imitation of such constructions 
the child knows little about and, therefi 
advance knowledge of the form in both 

and productive mode. Using 
a form would establish ideal conditions 
di between usage and know 
"The reduction of this discrepancy, 
to Brown and Hanlon (1970), is the basis 
the role of imitation in acquiring new lam- 
guage forms. 

"The linguistic complexity of the 
construction must be considered when ae 
counting for differential environmental ef 
fects. For example, Lahey (1971) fi 
simple modeling to be effective in inc 
use of a fairly simple construction, d 
tive adjectives, by Head Start child 


j 


Ff 


unt 


tional, participial) were su 
through simple modeling, l 


child's existing repertoire. 
seemed to increase usage of const 
already learned by the child, whereas 
tation training seemed to teach the 
struction. Whether or not the const 
exists in the child's repertoire may be 
function of the linguistic complexity of 
construction (Menyuk, 1969) or some 
variable including cognitive comple 
functionality, or frequency of adult usage 
the construction. 
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Research on teachers as leaders is reviewed, 
and inconsisten 


fifty-five students 


also 
ue eren with 
role but not high role clarity, and (b) the differences in these correla- 


tions were statistically significant. 
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‘The literature on teachers as leaders has 


Recramts foe tepritta should be 
Graduate & a CENA, 
State University, Kent, Ohio 44242. 


Cogrrruhet 1978 by the Americas Poycheiegical Association, lnc. 0072. T " 
928 


have produced unclear t results. A new approach—viewing 

teachers in a situational context—is suggested and illustrated. One hundred 

filled out two questionnaires; the first (administered during 

the 5th week of class) contained measures of teacher leadership behavior and 

and the second (administered during the 10th week) mea- 
overall satisfaction with 

assessed at the 10th week, t was found that (a) student performance sig- 


|, showing that current approaches 


the class. Student performance was 


and directiveness under low 


two striking parallels with the literature 
leadership in other settings. First, person 
centered and task-centered factors have 
emerged in most studies of industrial, mili- 
tary, and educational leadership, so that í 
is now commonly accepted that industri 
military, and educational leader behavi 
can be characterized along these two 
mensions (Stogdill, 1974). Similarly, 
son-centered and task-centered factors have 
emerged in studies examining teacher be 
havior (e.g, Coffman, 1954; Gibb, 195% 
McKeachie, 1961; Ryans, 1960). 
is encouraging, since it supports 
contention of Gagné (1965) and others 
teachers are leaders and that the study 
teaching effectiveness may be advan 
examining teacher leadership behaviors. 
Second, with few exceptions research 
teachers as leaders has tended to look fof 
best" way or "best" leadership style, Wi 
out taking the specifics of the teaching. 
uation into account, for example, such 
as the personal relationships between 
teacher and the students, the structure of 
learning task, and the teacher's power 
the class have been ignored (Fiedler, 1 
panying this search for a best way’ 
the fact that relationships among Pe 
centered and task-centered teacher 


mand various effectiveness criteria are not 
all clear or consistent. For example, in 
study, supportive (person-oriented) 
er behavior was found to significantly 
ance student performance on examina- 
ns (Dawson et al., 1972), but in another, 
ive (task-oriented) behaviors resulted 
significantly more learning than sup- 

ive behavior (Jabs, 1975). Similarly, 
ith respect to the relationship between 
mcher effectiveness and leadership style, 
ificant correlations have been obtained 
ween a teacher's evaluation and sup- 
ive behavior (Lahat-Mandelbaum & 
tipnis, 1973), as well as directive behavior 
livan & Skanes, 1974). Findings with 
ect to other criteria have also been un- 


This is remarkably similar to both the 
pproach and the findings of industrial, 
ilitary, and educational theorists during 
hat has been termed the "behavioral 
hase" of leadership research (Jacobs, 1970). 

ing this period, running from about 1945 


faction and performance under all sit- 
tions). However, the results obtained 
tere highly unclear and inconsistent, so that 
done best way to lead was found, Asare- 
t, researchers in these areas have recently 
n to study the effects of the situation in 
hich leaders and subordinates find them- 
ès, and accompanying this new approach, 
istent and significant findings have 
n to emerge (Kerr, Schriesheim, Mur- 
+ & Stogdill, 1974). Thus, contemporary 
dustrial, military, and educational lead- 
ip researchers now view leadership as a 
involving interactions among the 

der, the led, and the situation, and this 


n the literature and arguments sum- 
ized above, it seems that (a) much cur- 
! teaching leadership research is in a be- 
oral phase, and (b) no situational re- 
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search exists upon which to build theories or 
hypotheses in the teaching leadership area. 
However, it also seems clear that use of a 
situational approach has substantial promise 
for providing consistent and interpretable 
teacher leadership findings. 

Because of this, the literature on indus- 
trial, military, and educational leadership 
was examined in the hope of finding testable 
hypotheses which would seem generalizable 
to the teaching situation and also be suitable 
illustrations of the usefulness of a situational 
approach to classroom leadership. Al- 
though a number seemed acceptable (see 
Kerr et al., 1974; Stogdill, 1974, for reviews 
of these possible hypotheses), it was decided 
to test two hypotheses which have received 
much support in this literature. These, 
modified from Kerr et al. (1974, pp. 73-74) 
with illustrative supportive references, are 
as follows: 

l. The greater the amount of student 
role clarity, the greater will be the relation- 
ship between supportive (person-oriented) 
teacher behavior and student satisfaction 
and performance. 

2. Thelower the amount of student role 
clarity, the greater will be the relationship 
between directive (task-oriented) teacher 
behavior and student satisfaction and per- 
formance (e.g., Dessler, 1972; House, 1971; 
House & Dessler, 1974). 

Kerr et al. (1974) used Rizzo, House, and 
Lirtzman's qe € role red 
(the degree to which respondents see their 
role demands as unambiguous and predict- 
able) in the above hypotheses, and most of 
the studies they cite used Rizzo et al.'s 
measure of role clarity. (For comparability, 
both the definition and measure of Rizzo et 
al. will also be = this sent) 

The logic underlying 
comes from the path-goal theory of leader- 
ship effectiveness (House, 1971; House & 
Dessler, 1974), Essentially, it is that when 
student roles are clear, they are more likely 
to see the class as less interesting, and thus 
teacher supportive behavior may offset a 
lack of intrinsic class satisfaction. Similarly, 
when student roles are poorly defined, they 
are more likely to see the class as interesting 
but will need teacher directiveness to per- 
form well and derive satisfaction from it. 


930 
Method 


Sample 


‘The initial sample consisted of 173 freshman students 
at a large midwestern university taking an identical 
academic course. The course had 76 sections, and from 
this 16 were chosen at random to participate in the 
study (each class had a different instructor), Incom- 
plete questionnaires were obtained from 18 students, 
and they were excluded from the analysis, producing a 
final sample size of 155, 

The course i» a comprehensive orientation class 
covering an overview of university resources, study 


skills, future career exploration, and course registration 
procedures. ‘The class lasts for 10 weeks, and students 
receive grades on a pass-fail system. 
Procedure 

Two were administered to the stu- 
dents during regular class time. Both included a cover 
letter assuring them and that 


“this questionnaire is not a test of ability or consistency, 


SD = 100 
Class 
tenth measure hearsay 
‘chen, Dewis, a MSQ: 
to measure or general AU diam 


skills (items 5 and 7), and the instructor's 
knowledge (items 6 and 12), Then inem en, 
the procedure. 


retical definitions and revised items were presented 
a counterbalanced design) to a panel of 14 judges ohn 
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sorted them into six piles (one for each theoretical sat 
isfaction facet). Since only one judge misassigned o 
one item, all 12 items were retained and used in 
measure. Internal consistency reliabilities for the MSQ 
of B5 or better are reported in the manual for over tw 
dozen studies, along with data pertaining to the ig. 
strument"'s validity; the reliability of this modified 

was 86 as measured by coefficient alpha in the current 
sample (M = 44.90, SD = 6.65). 

Role clarity measure, Four items from the 6-item 
scale of Rizzo et al. (1970) were modified to measure 
classroom role clarity (see Table 1). The 6-item Rizzo 
‘etal. scale has been shown to be factorially independ 
of role conflict and to have reliabilities in excess of, 
in seven different samples (Dessler, 1972); data com 
cerning the validity and reliability of this instrument 
are presented in Rizzo et al. (1970). The reliability of 
this 4-item version was .85 in the present sample (M = 
15.39, SD = 2.97). 

Teacher leadership measures. As shown in Table 
1, two types of teacher leadership behavior were mea- 
sured: support (6 items) and direction (4 items), Both 
scales use items which have been subjected to extensive 
refinement, testing, and validation (Schriesheim, 1978). 
"The measure of support involves person-oriented 
teacher behaviors that show concern for student per 
sonal needs and welfare; direction concerns task-orl- 
ented teacher behaviors that are aimed at giving stu- 
dents a clear understanding of what is expected of them. 
"These two scales are similar in content to and were de- 
rived from (and developed as shorter versions of) the 
Ohio State leadership measures of Consideration (leader 
behavior that emphasizes concern for subordinates’ 
well-being and feelings) and Initiating Structure ( 
behavior that defines and structures the leader's role 
and those of subordinates; Stogdill, 1963, s 
ficient alpha reliabilities of .81 (M * 24.0: 
for support and 81 (M = 15.77, SD = 2.61) for di 
were obtained in the current sample. 


Method of Analysis 


As noted very briefly above, a predictive correlations 
design was employed. The predictor (teacher leader’ 
ship) and moderator (student role clarity) variable 
measures were gathered during the 5th week of ¢ 
‘These were correlated with the criterion variabli 
(student class satisfaction and performance), 
were measured during the 10th week. ‘The ana 
proceeded as follows. , 

First, to determine whether restriction of range in 
teacher behavior and classroom satisfaction varia 
could weaken the hypothesized relationships. 
and standard deviations were calculated on these V 
ables by class. Next, for each variable the average 
the class standard deviations was compared wit 
normative standard deviation, to determine wi 
range restriction was likely to exist. 

‘Then, to determine whether the predictor varia 
(teacher leadership) and the moderator variable ( 
clarity) were highly related, correlations were 
among these variables (high correlations would i 
that the moderator was similar to the leadership 


Measures Employed 


work by myself. 
variety in the class nts. 
my instructor 
"know-how" of my 

he way my instructor takes care 


"The chance to make decisions on my own. 
he amount of 


he extent of knowledge my instructor 


or... 


€ 
irplains the lev 


- Gives me unclear g 


pe are uie goals and obj 
i know exactly what is ex 
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Student role clarity" 
ves for me. 


at my responsibilities are. 
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Class satisfaction* 
spirit of cooperation among my fellow students, 


NER noticed when I do a good job on an assignment. 

the students. 

dh of individual complaints brought to him/her by the stu- 
"The chance to do different things from time to time. 


; ition for the work I do. 
‘The chance to develop close Pewter s ith my 


students. 
about University programs and policies. 
Teacher support 


of me. 
am given clear explanations of what has to be done. 


categories: 
item (reverse-scored), 


ind thus that a situational or moderator effect 
detected). 


der correlations were computed among 
or variables, to determine their degree of 
"The criterion variables were also corre- 
ine whether these were independent as 


i the sample was divided at the median of role 
I5) into two moderator low role clarity 
(= 12.66, SD = 242; n = 62), and high role 
>15; M = 17.16, SD = 1.60; n = 93). The dif- 
in mean level of role clarity between the two 

it computed and tested for significance. 


irge and significant difference was obtained (¢ 
'P <.001), and it could be concluded that the 
! was in fact low in relation to the high group, 
predictor and 
| Variables for each moderator group (sepa- 
Last, the difference between the correlations 


OD computed among the 


categories: 1 = very dissatisfied; 2 = dissatisfied; 3 = undecided or neutral; 4 = satisfied; 5 = very satisfied. 
: 1 very false; 2 = false; 3 = neither true nor false; 4 = true; 5 = very true. 


for the low and high groups were computed and tested 
for significance. 


Results 


Methodological Results 
Table 2 presents the means and standard 
deviations computed for the teacher lead- 
ership and student satisfaction variables for 
each of the 16 classes. Table 2 also presents 
normative means and standard deviations 
for these scales, derived from similar mea- 
sures (see the tory footnotes to Table 
2 for details). As shown in Table 2, restric- 
tion of range does not seem to represent a 
problem for the current sample—the stan- 
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Table 2 
Means and St 


Support 


Class n M 


1 11 23.55 6.14 

2 10 24.50 2.84 

3 11 26.00 2.97 

4 11 22.73 3.61 

5 5 23.80 3.63 

6 8 25.63 2.07 

m 12 21.92 247 

8 6 27.83 2.32 

9 6 25.83 1.60 

10 12 24.00 2.73 

11 12 23.50 2.68 

12 10 23.40 4.17 

13 9 21.33 2.50 

14 10 23.80 3.88 

15 10 24.90 2.56 

16 12 24.33 2.96 

Average 155 24.03 3.17 
Normative average 23.298 3.36" 


^ Obtained by averaging the normative data given by Stogdill (1963) for 943 subjects in 9 samples, using the Consideration s 
of the Ohio State Leader Behavior Description Questionnaire; to standardize T 
multiplied by the ratio of support items to Consideration items (6/10), per the suggestion of McNemar (1969,p.25). — 
the same normative data given by Stogdill (1963), using the Initiating Structure scale; standardized 


» Obtained by averaging 
a common length using a ratio of 4/10. 


© Obtained by averaging the “general satisfaction” normative data given by Weiss, Dawis, England, and Lofquist (1967) ) for 24 
subjects in 25 samples, using a transformation factor of 12/20. 


dard deviations for all three variables are 
neither appreciably nor statistically different 
from those shown for the normative data. 
Furthermore, comparing the standard de- 
viations for support, direction, and satis- 
faction for the total sample (3.51, 2.61, and 
6.65, respectively) with the normative values 
presented in Table 2 also shows that range 
restriction is not a problem. 

Correlations were computed between the 
predictor variables and the criterion vari- 
ables, to determine whether they were in- 
dependent. The two criterion variables, 
satisfaction and performance, were uncor- 
related (r = .17, ns), but a significant (p < 
-01) and substantial correlation was obtained 
between the two predictor (leadership) 
variables (r = .40). For this reason, to allow 
the drawing of less confounded conclusions, 
all subsequent analyses used partial corre- 
lations (removing the effect of one teacher 
leader behavior from correlations between 

the other and the dependent variables), as 
suggested by House and Dessler (1974). 
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andard Deviations for Leader Behavior and Satisfaction Variables by Class 


Direction Satisfaction 


M SD 


16.64 2.06 44,55 10.39 
14.40 2.22 42.40 7.01 
17.00 3.03 48.46 3.21 
15.64 323 44.00 1.39 
16.00 212 44.40 2.51 
16.88 2.70 51.13 5.08 
14.00 247 41.25 8.72 
17.67 1.86 45.83 471 
16,83 172 47.83 3.39 
15.93 1.78 41.75 892 
15.75 2.93 42.92 724 
15.80 2.57 45.00 6.38 
14.67 2.29 45.33 5,72 
14.60 2.95 45.00 3.37 
16.70 211 45.10 3.04 
16.00 3.13 47.33 431 
15.77 2.48 44.90 6.02 
15.93^ 2.08 45.85€ 6.60* 


ize to a common length instrument, this result. 


The last methodological consideratio 
examined prior to the hypotheses tests wi 
whether the moderator was highly correla' 
with the two predictors. Here, correlations 
were computed between role clarity an 
supportive and directive teacher leader be 
havior, using the average scores obtained fi 
each class (n = 16) as well as the scores f 
each respondent in the sample (N = 155 
The results of these analyses support the 
of role clarity as a moderator in this sampl 
Supportive teacher behavior had nonsig! 
ficant correlations with role clarity at bo 
the class (r = .24) and individual respond 
(r = .09) levels. Directive teacher behav 
had a nonsignificant correlation with T 
clarity at the class level (r = .41), but 
correlation was significant (p < .01) for t 
analysis with individual respondents (" 
49). However, since this significant co™ 
lation is still far from unity (it involves 9 
2496 shared variance), it does not seem 
reasonable to use role clarity as a modera 
of directive teacher leadership-studé 


x 
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Correlations for 


moderator subgroups 
Independent-dependent Unmoderated Difference 
variables correlation? LRC? HRC* (HRC - LRC) 
Support-performance 01 —.20t 04 24* 
Support-satisfaction aant .3gttt ‘ott d 
Direction-performance 07 ant —.05 = /32"* 
Direction-satisfaction 16 13 18 05 


High role clarity; n = 93. 


Ip «.10, one-tailed. ** p <.05, one-tailed. tp <.10, two-tailed. tt p <.05, two-tailed. ttt p « .01, two-tailed. 


lisfaction and leadership-performance 
llationships. 


ypotheses Results 


Table 3 presents the results of the analyses 
ried out to test the two hypotheses. As 
own in Table 3, these results support the 
ypotheses with respect to the performance 
iable, but not with respect to satisfaction. 
pecifically, it can be seen from Table 3 that 
he support-performance and direction- 
erformance partial correlations are signif- 
ant under low role clarity and nonsignifi- 
nt under high role clarity. Furthermore, 
Ne support-performance correlation is sig- 
jificantly higher under high role clarity (as 
Medicted), and the direction-performance 
relation is significantly lower (as also 
edicted). 
The results for student class satisfaction 
enot consistent with the hypotheses. For 
ê support-satisfaction relationship, sig- 
Hicant and positive correlations are ob- 
ed under both low and high role clarity, 
JI the difference between the two correla- 
ms is nonsignificant. Similarly, for the 
i *clion-satisfaction relationship, both 
relations are nonsignificant, and a 
nsignificant difference is also obtained 
Ween the two. 


Discussion 


The data reported in the results section 
ac the hypothesized usefulness of the 
"^onal approach to classroom leader- 


ship. The moderator variable of role clarity 
allowed the relationships between perfor- 
mance and the leadership variables to 
emerge, which otherwise would have been 
obscured. As shown in Table 3, in the un- 
moderated condition the results indicate no 
significant relationships between the sup- 
port-performance and direction-perfor- 
mance variables. However, through the use 
of the situational or moderator design the 
results indicate that teacher leadership does 
exert influence on student performance, and 
that the nature of this influence depends 
upon the specific situation (role clarity) in- 
volved. 

The specific hypotheses derived from the 
path-goal theory received mixed support. 
As predicted, when students' roles are poorly 
defined (low role clarity), teacher directive- 
ness enhances student performance. Sup- 
port was also found for the predicted rela- 
tionship between support and performance 
under low role clarity (that teacher support 
is negatively related to performance). The 
hypothesized relationships between satis- 
faction and leadership, however, did not 
emerge. It was predicted (but not found) 
that under high role clarity, supportive 
teacher behavior would offset a lack of in- 
trinsic class satisfaction and lead to higher 
overall class satisfaction. The expectation 
that teacher directiveness would lead to 
student satisfaction under low role clarity 
was also unsupported. 

The exact meaning of this mixed support 
is unclear. Because of its correlational na- 
ture, even perfectly predicted results would 
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have been ambiguous. When the subjects 
were divided on their role clarity scores, an 
unknown number of other variables may also 
have been manipulated. The effect of these 
unknown variables on performance and 
satisfaction (or even on the perception of 
leadership) may have obscured relationships 
existing in the classes. Thus, while perfectly 
predicted results would clearly have been 
more desirable, the results obtained here and 
the literature cited in the introduction (on 
the generalizability of leadership findings to 
the classroom) are encouraging enough to 
warrant further investigation. Ideally, this 
would take the form of an experiment in 
which role clarity is manipulated, or a more 
closely controlled and more heavily instru- 
mented (measured) field (classroom) 
study. 


Conclusion 


Throughout this article it has been argued 
that researchers interested in examining 
teacher effectiveness from a leadership 
perspective should employ a situational 
approach, and problems inherent in treating 
teaching as leadership have not been dealt 
with in any detail. Thus, perhaps a good 
way to conclude this article is by briefly 
pointing out what seem to be three major 
problems in applying a leadership approach 
to the study of classroom teaching. 

First, it should be noted that fully ade- 
quate measures for use in teacher leadership 
research do not exist. For example, the in- 
struments used in this study are modified 
versions of scales used in industrial research. 
Thus, it is not certain that these modified 
instruments still measure the constructs 
they were intended to measure. Care and 
attention must therefore be devoted to the 
development of measurement instruments 
for use in future teacher leadership stud- 
ies. 

Second, even if adequate measures were 
available for the study of teachers as leaders. 
the uniqueness of the classroom setting may 
prevent their valid application in some in- 
stances. For example, it is not unusual for 
students to have a teacher for only one 
course. This might produce invalid or less 
valid teacher leadership descriptions, due to 
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measurement instability (resulting from 
inability of students to solidify their im! 
pressions of teacher behavior before being 
called upon to describe their teacher for a 
research study). In industrial research, for 
example, this is rarely a problem (becau 
employees generally have considerable ex. 
perience working under the same supervi- 
sor). Thus, it may be that not all teaching 
situations will be amenable to study 
leadership. 

Finally, there is a lack of research and 
theory in this area, so that it is difficult to 
develop sound a priori hypotheses. The 
transfer of hypotheses from industrial ex- 
perience, for example, as done in this study, 
may not be valid, because of unknown dif- 
ferences in the nature of the environment 
facing students (as compared with industri 
employees). Thus, finding lack of suppo 
for hypotheses derived from industrial re 
search should not discourage future teache 
leadership researchers, and a literatu 
dealing strictly with teachers as leaders mus 
be quickly developed. 

In conclusion, then, it seems that th 
study of teachers as leaders may prove ben: 
eficial in advancing knowledge about 
teaching effectiveness. A situational ap- 
proach to teacher leadership seems needed, 
and although there are problems involved in 
applying this appraoch, they do not seem 
insurmountable. 

3 
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Facilitating Development of Four Moral Conce 
se K adarei and First-Grade Children 


Larry Jensen and Michael Murray 
Brigham Young University 


This article reports the results of a study of the effects of a training program 
designed to facilitate moral development in four specific areas: independence 
of sanctions, immanent justice, rules in games, and understanding punish- 


ment. Included were 58 kindergarten 


sexes, divided equally into treatment and control groups. Children in the 
treatment groups were read stories designed to stimulate discussion con- 
cerning the solution to moral issues presented in the stories. All children were 
individually pre- and posttested with questions following the presentation of 
stories. The children in the experimental groups, compared with those in the 
control groups, improved significantly (p < .01) following the brief training 
program in three of the four areas: immanent justice, rules in games, and un- 


derstanding punishment. 


A major concern in the area of moral de- 
velopment is whether or not procedures exist 
that can effectively accelerate the develop- 
ment of mature moral judgments in young 
children. Some researchers feel that brief 
training programs can be effective in sig- 
nificantly raising the level of maturity in 
regard to specific moral concepts, while 
others disagree (see Boyce & Jensen, 1978). 
To provide the theoretical basis to under- 
stand this disagreement in the area of moral 
development and education, it is appropriate 
to begin with the work of Jean Piaget. 

Piaget (1965) distinguishes two basic 
stages of morality. The first is the hetero- 
nomous stage characterized by an almost 
complete reliance on authorities’ construc- 
tions of right and wrong. The second, and 
more mature stage of moral development, 
Piaget calls autonomy. Instead of being 
governed by another’s law, the individual at 
this level of development becomes capable 
of producing morality. Here morality de- 
velops out of social intervention and mutual 
respect among equals. In the autonomous 
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pts Among 


and 60 first-grade children of both 


stage, the child, rather than obediently ac 
cepting rules, actually structures them fo 
himself and conceives them as modifiabl 
according to mutual agreement, circum 


take place without parallel advancement in 
cognitive structures. { 

Another cognitive-developmental theorist, 
Lawrence Kohlberg (1964), has developed 
model consisting of six sequential stages of 
moral development. Like Piaget, Kohlberg 
feels that moral training will not be suc: 
cessful in raising the level of moral maturit 
because of the child's dependence on natural 
cognitive maturation, which, if not already 
present, develops only slowly in a segmen 
fashion. 


can be a rapid shift or change from one stage 
to another as would be required if direct 
educational or training programs were sud 
cessful in moving children from a lower to 
higher performance level. t 

Other researchers have found results thal 
* 


J 


l 


contradict this viewpoint. Bandura 
cDonald (1963) found evidence that 
not “substantiate Piaget’s theory of de- 
d sequential stages of moral devel- 
ment" (p. 280). In their study they 
d 78 boys and 87 girls aged 5-11 years 
predominantly middle-class back- 
ds to make verbal discriminations in- 
ive of higher levels of moral reasoning. 
led them to conclude that Piaget’s 
del could not be accurate if moral judg- 
nts could be taught in such a short time 
ut directly stimulating cognitive ad- 
cements. 
riel (1966) and Crowley (1968) do not 
with the anti-Piaget conclusions found 
he Bandura and McDonald (1963) study, 
ferring the developmental-stage theory. 
erpreting Bandura’s research, Turiel and 
wley concluded that subjects were only 
irning a response they were being specifi- 
ly trained to make without any apparent 
ins in the general level of their moral ma- 
ity. Schleifer and Douglas (1973) found 
t children aged 3-6 years can be trained 
velop genuine gains in moral judgments 
than simply making superficial verbal 
minations. This can be done within 


ntal structure by presenting the child 
th alternating points of view which stim- 
the cognitive processes, “making it 
likely for him [the child] to progress to 
!mext sequential structure" (p. 66). 
Like the above studies, an investigation by 
isen and Larm (1970) indicates that brief 
ing programs produce significant re- 
is. In their study two different treat- 
nts were utilized in training sessions with 
ar-old children. One group was trained 
reinforced discrimination procedure, 
[the other was trained by a discussion 
hod, which required subjects to verbally 
their answers. This explanation 
insight into the moral reasoning of the 
jects, leading the experimenters to be- 
ë that the subjects were not just making 
Solated social response but did indeed 
tand the moral concept. This same 
lusion, that brief training programs can 
tly raise the level of a child's moral 
lrity, has been reached in other studies 
Jensen and his associates (Jensen & 
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Hafen, 1973; Jensen & Hughston, 1971; 
Jensen & A. Rytting, 1975; Jensen & M. 
Rytting, 1972; Jensen & Vance, 1972). 

Most of the studies discussed thus far have 
utilized a more direct approach in facilitating 
moral development, since training methods 
have employed procedures in which children 
have been directly reinforced for making a 
predetermined specific response. An indi- 
rect approach utilizing a discussion method 
is also possible, wherein subjects are not 
specifically reinforced for correct responses. 
The indirect approach has been shown to be 
as effective in facilitating children's under- 
standings of punishment as the direct ap- 
proach (Jensen & A. Rytting, 1975). Jensen, 
in his most recent writing, feels that indirect 
methods are preferable for both theoretical 
and practical reasons. 

The purpose of this investigation was to 
determine if brief training programs are ef- 
fective in advancing moral development. 
However, this goal is related to two impor- 
tant considerations, which presently lack 
empirical data. First, will some results be 
obtained outside the laboratory, in a natural 
setting such as an intact classroom? Sec- 
ond, can significant effects be produced 
using the indirect, discussion-type approach 
as opposed to a direct training program 
where the correct response is supplied, 
identified, and then reinforced directly? 
Jensen (1977) has developed indirect in- 
structional approaches to facilitate the de- 
velopment of moral judgments. His lessons 
form the basis of this investigation. Four 
specific moral concepts found in the litera- 
ture are used: consideration of rules in 
games, immanent justice, independence of 
sanctions, and understanding punishment. 
A discussion of each of these follows." 


Consideration of Rules in Games 


Piaget (1965) first explored the impor- 
tance of rules in children’s games. He de- 


1 Jensen's (1977) program is published by the Brig- 
ham Young University Press and is designed for kin- 
dergarten and first-grade children. It contains both 
theory and lesson materials written to be self-instructive 
for teachers of kindergarten and first-grade children. 
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termined that in the heteronomous stage of 
development, children are rigid in their ob- 
servance of rules. Rules are fixed by an au- 
thority making it wrong for the child to 
change them. As children approach and 
enter the autonomous stage, they become 
much more flexible, realizing that rules can 
be changed or altered on the basis of mutual 
agreement. 

Merchant and Rehelsky (1972) hypothe- 
sized that kindergarten and first-grade 
children become more flexible in their views 
toward rules if they are given an opportunity 
to participate in rule formation, As ex- 
pected, children in their laboratory learned 
that rules can be changed for cooperative 
purposes by mutual agreement. 


Immanent Justice 


Immanent justice, as defined by Piaget 
(1965), is the belief that some misfortune will 
automatically follow any wrongdoing. Pi- 
aget’s findings led him to believe that a 
child's belief in immanent justice decreases 
with age. Other studies have supported his 
findings (Abel, 1941; Dennis, 1943; Lerner, 

1937; Liu, 1950). Medinnus (1959) criticizes 
these studies, since they utilize just one story 
at all age levels in determining children with 
à belief in immanent justice. Medinnus 
maintains that there are several factors 
which will influence the response of a child 
toa particular story. Jensen and M. Rytting 
(1972) concur with Medinnus in this regard 
as they state that “the amount of immanent 
justice is dependent upon the amount of 
causal information and the amount of rela- 
tedness in the moral dilemma" (p.95). The 
materials used in this investigation meet the 
criteria specified by the above investiga- 


tors. 


Independence of Sanctions 


According to Piaget (1965), independence 
of sanctions refers to the ability to judge an 
act as either being right or wrong indepen- 
dent of the sanction or consequence that 
follows that act. Piaget maintains that 
morally mature children will respond that a 
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good act followed by punishment is still in 
deed a good act, whereas the less mature 
child will feel that the act is bad. Kohlberg 
(1964) states that not until about 7 years of 
age do most children show independence of 
sanctions. Jensen and Hughston (1971) 
found that following a brief training program 
in their laboratories, 4- and 5-year-old chil- 
dren were able to make more mature moral 
judgments in regard to independence of 
sanctions. 


Understanding Punishment 


Piaget (1965) explains the difference be- 
tween two types of punishment: retribution 
and reciprocity. Retribution is punishment 
prompted by the motive of revenge, along 
with a reliance upon adult authority. On the 
other hand, punishment by reciprocity is a 
concept of fairness and justice based upon 
another person’s needs and point of view. 
Piaget believes that children in the hetero- 
nomous stage regard retribution as the 
punishment that should be employed, but 
they progress to the autonomous staj 
punishments using reciprocity are consid- 
ered. Jensen and A. Rytting (1975) con- 
clude that children in their study aged 3-5 
who were exposed to training dealing with 
the efficacy of reciprocity were able to make 
more mature judgments regarding punish- 
ment when posttested. 1 

As has been reviewed, previous experi: 
menters have had success in training young 
children, ages 3-6, to make moral judgments: 
indicative of an increase in their level 
moral maturity, or, in Piagetian terms, & 
change from the heteronomous stage to t 
autonomous stage of moral reasoning. 
appears, however, that the evidence is n 
conclusive in determining whether the stal 
of development has actually been incre 
or whether a verbal discrimination has been 
reinforced and learned. 

.. The present study bears upon this issue 
it attempts to determine the effect that bi 
training programs have on the moral de 
opment of young children. It is felt by 
authors that the indirect approach (as 
posed to the direct approach) offers ™ 
conclusive evidence about the nature 
moral change. If the correct response 1$ 


lentified by the adult yet the child later 
responds by identifying the more mature 
concept, then it is less likely that a simple 
reinforced verbal discrimination has been 
trained. Besides utilizing an indirect ap- 
proach, this study differs in two other re- 
gards. Prior experimentation has been 
conducted in restricted laboratory settings, 
whereas the present program has been de- 
veloped for use in the classroom, a natural 
context. Also, unlike other studies exam- 
ining only one area of moral development, 
subjects in this study were trained in four 
different, though related, areas of moral 
judgment. More specifically, children from 
two kindergarten classes and two first-grade 
classes were taught lessons on rules in games, 
immanent justice, independence of sanc- 
tions, and understanding punishment during 
a4-week period in March and April of 1977. 
They then were tested to determine the ef- 
fectiveness of the program. 

The investigators hypothesized that the 
kindergarten and the first-grade subjects in 
the treatment groups would make significant 
advances in their levels of moral reasoning 
from the pretest to the posttest in compari- 
son with the control groups. Also, it was 
expected that first graders would score 
higher on the pretest than the kindergarten 
children, not only because of developmen- 
lal-stage theory, but because of sociocultural 
Variables such as maturation, ability to 

concentrate, and experience with school and 
tests. For the same reasons it was antici- 
pated, among the treatment groups, that the 
older children would show more significant 
tates of improvement unless their initial 
Means were too high, producing a ceiling 
effect. Finally, it was felt that there would be 
no differences in regard to the variable of sex 
(Hoffman, 1977; Jensen & Vance, 1972). 


Method 
Subjects 


Initially, the sample consisted of 58 kindergarten 
ildren and 62 first-grade children from a public ele- 
Mentary school located in Mapleton, Utah. Subjects 
Sere predominantly white, middle class, and Mormon. 
Wo first-grade subjects were not available for post- 
ting, narrowing the final sample to 58 kindergarten 
first-grade children. Of these, there were 65 

and 53 female children—33 male and 25 female 
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kindergarteners and 32 male and 28 female first grad- 
ers. 


Materials 


Subjects in the training groups were given lessons 
from Jensen's (1977) manual consisting of short illus- 
trated stories, Additional materials, including balls, 
cans, egg cartons, marbles, boxes, spools, and coins, were 
used for the lesson on rules, Teachers were asked to 
read the first 58 pages of the training manual covering 
the topics of moral education, the development of moral 
thought, and appropriate classroom atmosphere. 

The pretest consisted of stories similar to those dis- 
cussed in the training sessions, followed by questions 
similar to a forced-choice format. The posttest utilized 
the same stories and questions as the pretest, One ex- 
ample of each of the four types of stories is given 
below. 

Stories dealing with rules in games were similar to the 
following example: Sara and Bill were throwing a bean 
bag and trying to knock over a pile of blocks, Would 
it be fair or would it be wrong if both Sara and Bill 
agreed to move closer, making it easier to knock over the 
blocks? 

Stories dealing with immanent justice were similar 
to the following example; There was once a little boy 
who didn't mind his daddy, His daddy told him never 
to ride his big brother's bike, But one day when the 
little boy was outside all alone, he decided that it would 
be a lot of fun to ride his big brother's bike, While he 
was riding the bike, he fell off and skinned his arm and 
his knee. If his daddy had said it was all right to ride 
his big brother's bike, would he still have fallen and hurt 
himself? 

Stories dealing with independence of sanctions were 
similar to the following example: Two children were 
playing together and sharing their toys. ‘Their mother 
got mad and spanked them, Was sharing their toys 
good or bad? 

Stories dealing with understanding punishment were 
similar to the following example: Billy got mad at his 
little brother, Johnny, and broke Johnny's truck. What 
would be fair punishment for Billy? (a) Make Billy give 
Johnny one of his toys. (b) Break one of Billy's toys on 


purpose. 
Procedure 


Pretesting. All children present in the kindergarten 
and first-grade classes on the days of the testing were 
read 10 stories from each of the four areas previously 
discussed, resulting in a total of 40 stories. Standard 
procedure was to tell each child that he or she would be 
read some stories, and after each one, they would be 
asked a question. Following each story, subjects were 
asked a forced-choice question, and the examiner circled 
the response given on an answer sheet. 

"Two examiners were given training to insure that the 
stories were presented in the same manner and that 
questions asked by the subjects were resolved in the 
same way. However, the examiners were unaware of 
the design and the objectives of the investigation. The 
order of the 40 test questions was determined by ran- 
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domization and was the same for all subjects. The 
length of testing was approximately 15 minutes for each 
child. Scoring of the answer sheets was done later by 
counting and recording the correct number of answers 
in each of the four areas. T 

Training sessions. Before beginning training, 
subjects in the morning kindergarten class and subjects 
in the afternoon kindergarten class were ranked sepa- 
rately and placed into matched pairs according to their 
total score on the pretest. Individuals from the 
matched pairs were then randomly assigned to either 
control or experimental groups. The same procedure 
was followed for the first-grade children with the ex- 
ception that the two classes were combined for purposes 
of ranking, matching, and assigning. 

Subjects in the experimental groups received lessons 
the week following the pretesting. They were given a 
different lesson each week for 4 consecutive weeks. 
‘Training sessions lasted 15-20 minutes and were pre- 
sented on 3 consecutive days during the week. 

The lessons were presented to the kindergarten 
subjects by the kindergarten teacher, who was the same 
for both the morning and afternoon groups. Likewise, 
all of the lessons were presented to the first-grade group 
by one of the first-grade teachers. Prior to the treat- 
ment sessions, the two teachers read material (described 
previously) concerning moral development, At the 
beginning of each of the 4 weeks, the experimenter met 
with both teachers individually to discuss the lesson and 

the materials to be presented that particular week. 
‘This meeting, along with written instructions accom- 
panying each lesson, informed the teachers of the 
method of instruction to be followed. 

Basically, this method involved reading a series of 
short stories to the children while at the same time 
showing them pictures illustrating the stories. Fol- 
lowing each story, the teacher asked questions to en- 
courage discussion concerning the solution to the sit- 
uation presented (i.e., punishment to be used, fairness 
of an act, value of an act). The teacher encouraged 
discussion, clarified responses, and presented any al- 
ternatives not mentioned in the discussion, attempting 
not to reinforce or identify the most mature response. 
‘This procedure was followed for all lessons except for 
the one on rules in games. 

In the lesson on rules, subjects were read a story about 
children who used various objects to make up their own 
games, In following sessions, subjects were divided into 
groups of 4 or 5 and given various objects, as listed 
earlier, and were told to make up their own games. As 

subjects were engaging in this activity, the instructor 
went among them encouraging discussion concerning 
rule formation and the fairness of changing rules already 
set. During the last session of this lesson, subjects were 
gathered together, and a discussion was held concerning 
the experience of rule formation, with the teacher once 
again following the predetermined guidelines for dis- 
cussions. 

While the treatment groups were receiving the les- 
sons, individuals in the control groups were given extra 
recess time outside of the classroom. 

Posttesting. Different examiners were trained in the 
same manner as those who administered the pretest. 
The posttest was given the week following the presen- 
tation of the fourth lesson, 
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Results 


A repeated measures analysis of variance 
was used to analyze the data. The experi- 
mental design model used was Y(IJKL) = 
Sd) + G(J) + R(K) + T(L) + SG(IJ) + 
SR(IK) + STOL) = GR(JK) + GTOLD) + 
RT(KL) + SGR(IJE) + GRT(JKL) + 
SRT(IKL) + SGT(IJL) + SGRT(IJKL) + 
E, wherein S represents sex, G represents 
kindergarten and first grade, R represents 
experimental and control groups, 'T repre- 
sents pre- and posttest and E represents er- 
rors. Main effects of grade, group, and test 
were found as well as interactions between 
group and test, sex and test, sex and grade, 
and sex, grade, and test. Mean scores are 
presented in Tables 1, 2, and 3 for all signif- 
icant interactions. To determine the sig- 
nificance between grade level on the pretest, 
t tests were used because the main effects of 
the F test for grade included both the pre- 
and posttests. The four dependent vari- 
ables were examined separately, and the 
specific results for each dependent variable 
are discussed below. 


Independence of Sanctions 


A significant main effect for grade was — 
found, F(1, 220) = 8.70, p « .01 (kindergar- 
den M = 7.18; first-grade M = 8.51). How- 
ever, a higher order interaction occurre 
between grade and sex, F(1, 220) = 4.71, p € 
.05. Examination of Table 1 shows that the ) 
Sex X Grade interaction occurred not only 


Table 1 
Mean Pretest and Posttest Scores for All 


Significant Sex X Grade Interactions 


Kinder- First 
Moral area and sex SD garten grade 
Independence of 
sanctions 3.41 
Male 641 8.11 
Female 7.95 8.30 
Understanding 
punishment 1.992 
Male 6.78 8.65 
Female 7.34 8.04 


Note. Maximum score = 10. 


ible 2 Akai: 
n Scores for All Significant Treatment 


p X Pretest-Posttest Interactions 


Group SD Pretest Posttest 


3.38 

3.74 4.67 

4.16 8.96 
2.91 

3.63 3.36 

Experimental 4.59 6.04 

Understanding 

i 1.99 

7.57 7.49 

7.21 8.54 


Note. Maximum score = 10. 


ause first-grade children scored higher 
kindergarten children but more spe- 
ically because kindergarten males scored 
low. The first-grade children scored 
igher on the pretest than the kindergarten 
ildren, £(116) = 8.63, p < .01 (first-grade 
M = 8.33; kindergarten M = 7.64). 


mmanent Justice 


Main effects for group, F(1, 220) = 27.99, 
D < .01 (control M = 4.21; experimental M 
= 6.56), and for test, F(1, 220) = 41.39, p < 
01 (pretest M = 3.95; posttest M = 6.82), 
Were found. The higher order interaction, 
Group X Test, F(1, 220) = 18.84, p < .01, will 
discussed further. As can be seen by 
Teference to Table 2, the experimental group 
Scored significantly higher than the control 
group on the posttest. 


Table 3 
Mean Scores for Significant Sex X Grade X 
Pretest-Posttest Interaction on Rules in 


ames 
me, 


Sex and grade Pretest Posttest 
Kindergarten 
Males 3.19 3.93 
Females 3.82 4.64 
First grade 
Males 6.00 4.43 
Females 3.42 5.80 


Nate. Maximum score = 10. SD = 2.91. Data are for ex- 
"imental and control group subjects. 
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Rules in Games 


Main effects for grade, F(1, 220) = 7.05, p 
< .01 (kindergarten M = 3.90; first-grade M 
= 4.91), and for group, F(1, 220) = 22.65, p 
< .01 (control M = 3.49; (experimental M = 
5.32), were found. An interaction was found 
between sex and test, F(1, 220) = 6.92, p < 
01. The means for males on the pretest and 
the posttest were 4.60 and 4.18, respectively, 
and for females on the pretest and the post- 
test were 3.62 and 5.22, respectively. This 
interaction and both main effects can be 
explained by analyzing the higher order in- 
teractions that occurred. Group and test 
interacted, F(1, 220) = 5.00, p < .05, indi- 
cating that the experimental groups im- 
proved following the training. Mean scores 
are recorded in Table 2. Also, there was a 
Sex X Grade X Test interaction, F(1, 220) = 
6.34, p <.05. As can be seen by inspecting 
the mean scores in Table 3, this interaction 
was due to a high pretest score by male first 
graders followed by a significantly lower 
posttest score. The ¢ test confirmed that 
children in the first grade scored higher than 
those in the kindergarten, ¢(116) = 16.12, p 
« .01 (first-grade M = 4.71; kindergarten M 
= 3.51). 


Understanding Punishment 


The main effects of grade, F(1, 220) = 
23.97, p € .01 (kindergarten M = 7.06; 
first-grade M = 8.32), and test, F(1, 220) = 
4.95, p < .05 (pretest M = 7.39; posttest M 
= 8.01), must be qualified and explained in 
terms of higher order interactions. Sex in- 
termixed with grade, F(1, 220) = 4.95, p < 
05. Examination of Table 2 shows the 
reason to be the same as it was for Sex X 
Grade for the dependent variable, indepen- 
dence of sanctions. Kindergarten subjects 
scored lower than first-grade subjects due to 
the low scores of the male kindergarten 
children. The interaction, Group X Test, 
found in the last two areas, was again found 
for this dependent variable, indicating that 
the experimental subjects developed sig- 
nificantly in their understanding of pun- 
ishment as compared with the control 
subjects, F(1, 220) = 7.29, p < .01. Mean 
scores can be found in Table 1. Again, the 
results of the t test showed that the older 
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children scored higher on the pretest than 
the younger children, t(116) = 26.00, p <.01 
(first-grade M = 8.04; kindergarten M= 
6.74). 


Discussion 


The main hypothesis of this investigation 
was whether or not a brief training program 
would raise the level of moral reasoning in 
each of four specific areas. The results in- 
dicate that in three of the four areas— 
immanent justice, rules in games, and un- 
derstanding punishment—children in the 
experimental groups made significant gains 
over those in the control groups following the 
presentation of material designed to produce 
change. Only in the area of independence 
of sanctions did the results fail to support the 

hypothesis. 

The second hypothesis, which stated that 
first-grade children would score higher on 
the pretest than those children in kinder- 
garten, proved to be true for all four con- 
cepts. 

Next, it was expected that the first-grade 
subjects in the experimental groups would 
improve more than the kindergarten 
subjects in the experimental groups. This 
was not confirmed for any of the four con- 
cepts, as a Grade X Group X Test interaction 
did not appear in the analysis. 

Finally, the last hypothesis stated that the 
variable of sex would have no effect. There 
were no main effects involving sex; however, 
the results showed that in interaction with 
other variables, sex was a significant factor. 
In the areas of independence of sanctions 
and understanding punishment, sex inter- 
acted with grade. In both cases first-grade 
children scored higher than kindergarten 
children due to the low scores of kindergar- 
ten males. 

The present results indicate that brief 
training programs can be effective in raising 
the level of moral maturity in young children 

just as other studies have shown (Bandura 
& McDonald, 1963; Crowley, 1968; Jensen & 
Vance, 1972; Schleifer & Douglas, 1973; 
Turiel, 1966). These results are not well 
received by developmental-stage theorists 
who argue that brief training programs have 
little effect in advancing the underlying 
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structure of children's moral reasoning, | 
These theorists explain such training sue. 
cesses as being due to the reinforcement of 
specific responses. Care was taken in the 
present study not to reinforce any particular 
type of response. Instead, children were | 
taught these concepts through a very indi- — 
rect approach; children participated in dis- 
cussions prompted by stories read to them 
byan instructor. Correct answers were not 
directly reinforced. It was through thisin- | 
teraction that alternative solutions were of- 
fered and learning occurred. However, this | 
development seems to be similar to the way 
in which Piaget (1965) theorized that chil- 
dren could progress from the heteronomous 
to the autonomous stage. He believed that 
this development would be aided by peer - 
interaction that provided an atmosphere of 
freedom and openness. However, Jensen 
(1977) proposes that it is not the peer inter- 
action per se that is the essential element in 
moral growth; rather, it is experiences in- 
volving reciprocity, openness, a sense of 
equality, and the resolution of conflict that | 
could take place in other nonpeer social 
contexts. If social interaction occurs in this 
type of an atmosphere, then arriving at the 
solution to a situation of a problem will elicit 
cognitive disequilibrium whereby individu- 4 
als move to more mature levels of moral 
reasoning. To further elaborate; cognitive 
disequilibrium occurs when different levels 
of reasoning or new information are pre- 
sented to an individual, and he experiences 
some type of mental discomfort, which in- | 
fluences the development of reasoning 
abilities (Biskin & Hoskisson, 1974; Jensen, 
1977). 

It is felt that the process described above 
occurred in the three areas where the 
training proved to be successful. In the 
fourth area, independence of sanctions, 
change due to the training did not occur. 
This differs from the results obtained by 
Jensen and Hughston (1971), since those 
investigators found that 4- and 5-year-o 
children were able to make gains in the area 
of independence of sanctions following 
training. A possible explanation for the 
results of the present study is that the 5- a" 
6-year-old children in the treatment group’ 
averaged better than 8 out of 10 correct 1° | 


of 5 and 6, were apparently already 
oning at the higher level of morality in 
d to the concept of independence of 
tions than what has been assumed to be 
e case by some developmental theorists 
‘ohlberg, 1964). Prior research by the 
thors also found advancement in com- 
ension of this concept (Jensen & 
hston, 1973). 
was hypothesized that the older chil- 
n would make larger improvements than 
the kindergarten children. One reason for 
s hypothesis is the belief that the older 
ldren are even closer to the age of which 
hange from heteronomous to autono- 
us reasoning occurs and are more likely 
make significant advancements in moral 
elopment; however, this did not occur. 
ead, kindergarten children improved 
roximately the same amount in each of 
four areas as first graders did. 
The fact that sex differences occurred in 
s study also tends to support the effect of 
lal factors on the development of moral 
soning. So far developmental theorists 
€ not identified sex differences in the 
evelopment of moral reasoning. In the areas 
Understanding punishment and inde- 
idence of sanctions, male kindergarten 
hildren scored significantly lower than the 
le children and the older male children. 
felt that these results were due to mat- 
lon and other sociocultural variables 
lated with male socialization during 
lése years. In regard to understanding 
ishment, the low scores by male kinder- 
en children and the high’ first-grade 
es of the males may be due to an in- 
d sensitivity of male children to the 
€ of punishment. Researchers have 
istently found differences in the social- 
on of males and females, with males 
More aggressive and sensitive to con- 
f punishment (Hetherington & Parke, 
ohnson, 1974). 
he sex differences in the unusual inter- 
M in the area of rules, in which first- 
€ males scored significantly high on the 
est and then decreased on the posttest 
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to the level of the others, is difficult to ex- 
plain. It seems as though males at the 
first-grade level when first presented with 
the stories reacted to and interpreted the 
situations differently from all other groups. 
This occurrence might be based on their 
previous experience in similar social situa- 
tions that may have produced some rigidity. 
Perhaps the reason for the male first graders 
showing less understanding following the 
training is due to the occurrence of chance or 
to the effect of uncontrolled secondary 
variables not apparent to the investiga- 
tors. 

The effectiveness of a brief training pro- 
gram in facilitating growth in several areas 
of moral development is considered the most 
conclusive finding of this study. It is also 
important to note that the development oc- 
curred following training that was conducted 
by classroom teachers rather than research 
investigators. In addition, the research was 
within the natural context of the educational 
system, rather than a more isolated research 
laboratory typical of previous investiga- 
tions. 
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Previous research has demonstrated that children's literature frequently pre- 
sents girls and women only in limited, "traditional" roles, with the result that 
girls exposed to such literature may limit their own self-perceptions and aspi- 
rations. In an experiment with fourth graders, 29 girls were read two stories 
with women in traditional roles or two with women in nontraditional roles. 
Attitude changes were measured by a picture-choice test, two job checklists, 
and two adjective checklists. As predicted, girls who heard nontraditional 
stories rated traditionally male jobs and characteristics as appropriate for fe- 
males more than girls who heard traditional stories. These results underline 
the importance of nonsexist books and textbooks in widening girls’ aspirations 


and self-images. 


Much attention has been given in recent 
years to the issue of sexism in children’s 
books, Social learning theory predicts that 
children learn what constitutes sex-appro- 
priate behavior from the sex role expecta- 
tions and role models they observe around 
them. The books they read, both in and out 
of school, provide a major source of role 
models (Frasher & Walker, 1972). If these 
models show women in limited, stereotyped 
toles, girls may tend to limit their own aspi- 
tations, 

Stull (Note 1) examined books that appeal 
Wolder children, including Newbery award 
Winning books. She found that many, but 
hot all, of the books presented girls and 
Women only in limited, traditional roles. 
| Newbery award winners were no better in 
this respect than books that had won no 
‘Wards, Older books were more likely than 
tecent books to present sexist images of fe- 
males, 

À study by Hillman (1974) compared 
thildren’s books written in the 1930s and in 
the 1970s. She found that books written in 
the 19705 have more female characters than 

Se written in the 1930s, but female char- 
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acters are still greatly outnumbered by 
males. 

Frasher and Walker (1972) examined 
widely used reading textbooks and found 
that males outnumber females by a large 
majority. Few of the females work outside 
the home, and those who do hold only tra- 
ditionally female jobs. Fathers hold the 
position of family leadership. Fathers are 
shown mainly outdoors, while mothers are 
indoors. Girls are shown engaged in more 
quiet activities than boys. 

A large number of studies point out the 
widespread sexism that exists in children's 
books and predict that this influences girls’ 
self-images and aspirations. There is evi- 
dence that positive outcomes result from 
exposing children to nonstereotypical 
stories. Litcher and Johnson (1969), using 
multiethnic reading textbooks, succeeded in 
changing the attitudes of white school chil- 
dren toward blacks. McArthur and Eisen 
(1976) obtained more achievement behavior 
from nursery school girls who had heard a 
story about an achieving girl than those who 
heard about an achieving boy. This suggests 
that the content of reading books is impor- 
tant in influencing children’s attitudes. By 
changing the content of the books children 
are exposed to, one may hope to change their 
attitudes toward themselves and others. 


‘American Psychological Association, Inc, 0022-0663/78/7006-0945800.75. 
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The current study was undertaken to de- 
termine the effects on girls of stories that 
portray women in nontraditional occupa- 
tions. It was predicted that girls who were 
exposed to such women would perceive 
typically male jobs as more attractive than 
girls who were read stories about women in 
traditional occupations. It was also. pre- 
dicted that these girls would judge typically 
male adjectives to refer to both males and 
females more than girls read traditional 
stories. 


Method 


Subjects 


Subjects were 137 fourth-grade students in two public 
schools. Six experimental groups were formed using 
all the students from the three fourth-grade classrooms 
at Alfred-Almond Central School, Almond, New York, 
a total of 66 students. Within each room students were 
assigned to traditional and nontraditional story groups. 
A total of 16 females and 16 males were assigned to the 
traditional groups and 18 females and 16 males to the 
nontraditional groups. The experimenter was inter- 
ested mainly in the data from the girls, although the 
data from boys were also examined. Five girls and four 
boys were absent for one or both of the treatment ses- 
sions, and their data were not included in the results. 
‘This left 15 girls and 16 boys in the nontraditional group 
and 14 girls and 12 boys in the traditional group. Two 
girls and five boys were absent for one of the posttests; 
their data were included in the results for the other 
posttests, 

The control group consisted of 71 students in three 
fourth-grade classrooms at Clinton Elementary School, 
Clinton, New York. Both schools are located in rural 
college towns. The student body in each is diverse, 
including middle- and lower-class students, faculty 
children, store owners’ children, and farmers’ children. 
Most students in both schools are white. 


Materials 


Four stories were used, two with women in traditional 
careers (telephone operator and nurse) and two with 
women in nontraditional careers (television director and 
veterinarian). The stories were based on the following 
children’s books: Linda Goes to a TV Studio, by 
Nancy Dudley (1957); I Know a Telephone Operator, 
by J. A. Evans (1971); What Can She Be? A Veter. 
inarian, by Gloria and Esther Goldreich (1972); and 
The First Book of Nurses, by Eleanor Kay (1968). The 
stories were edited or rewritten to fit the purposes of the 
experiment. For example, the sex of the director in 
Linda Goes to a TV Studio had to be Changed. After 
rewriting, the stories were of the same length and ap- 
proximately the same complexity. 
The posttests used included a picture-choice test, two 
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adjective checklists, and two job checklists. 
ture-choice test was given only to the experin 
groups, and the job and adjective checklists we 
to both the experimental and control groups, 

The picture-choice test consisted of eight p 
photographs. Each pair had a photograph of ay 
in a traditional role and a woman in a nontrad 
role (e.g., a woman watering plants and a female { 
phone lineperson). The experimenter decided 

the job was traditional or nontraditional. Asm 
was possible, the size of the pictures and th 
tractiveness, and facial expressions of the woman y 
the same in each pair. Black-and-white pici 
paired with black-and-white pictures; and color, | 
color. Girls were asked which woman in each p: 
would rather be, and boys were asked which 
they thought was happier in what she was doing, 
Two adjective checklists were used, each with 


Table 1 
Adjective Checklist 


Adjective 


Able 

Bullying 

Cheerful 

Clever 

Confident 

Cool 

Doesn’t trust people 
Does things without planning them 
Feelings are easily hurt 
Gentle 

Graceful 

Helpful 

Is very careful 

Kind 

Likes art and music 
Likes many things 
Likes sports 

Likes the outdoors 
Likes to ask questions 
Likes to be told what to do 
Likes to do things on one’s own 
Nervous 

Noisy 

Plans ahead 

Polite 

Rough 

Selfish 

Shows one’s feelings 
Shy 

Slow 

Smart 

Soft-hearted 

Strong 

Timid 

Tough 

Trusting 

Wants to get a lot done 
Warm 
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a Ratings by control group; M = male; F = female; 
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words or phrases. Words were selected from the ad- 

^ jective checklist used by Berdie (1959) and simplified 
for fourth graders (e.g., “skeptical” became “doesn’t 
trust people”). Students were asked whether each word 
or phrase described a male, a female, or both. A list of 
the adjectives used appears in Table 1. 

Two job checklists were used, each with 25 jobs. 
Thirteen of the jobs appeared on both checklists, in- 
cluding the jobs of the women in the stories. The jobs 
were listed in a nonsexist manner (e.g., “mail carrier” 
instead of “mailman”), and certain common jobs were 
not used because they could not be both nonsexist and 
understandable to fourth graders (e.g., “stewardess” is 
sexist, and most fourth graders have never heard of a 
“flight attendant”). Students in the experimental 
groups were asked to rate each job from 1 to 5 according 
to how much they thought they would enjoy doing it. 
Control group students were asked whether each job 


Table 2 
Job Checklist 


Job Rating 
Army officer 
Artist 
Athlete 
Carpenter 
College Professor 
Dancer 
Dentist 
Doctor 
Factory worker 
| Fire fighter 
Gas station attendant 
Homemaker 
Judge 
Lawyer 
Librarian 
Mail carrier 
lechanic 
inister 
Movie star 
usician 
Newspaper reporter 
Nurse 
Parent 
Pilot 
Plumber 
Police officer 
Tincipal 
staurant worker 
ales clerk 
ientist 
ecretary 
Senator 
Teacher 
Telephone operator 
levision director 
eterinarian 
Writer 
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Ratings by control group; M = male; F = female; B = both. 
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was done by males, females, or both. A list of the jobs 
used appears in Table 2. 

"The adjectives and jobs were read one at a time to the 
students. It was thought that students would recognize 
more words by listening than they would if they were 
able to read the words themselves. 


Procedures 


Students in the experimental groups were told that 
the experimenter was interested in what kinds of stories 
fourth graders liked and that the stories were written 
by a friend of hers. Each group heard two stories, one 
in each of two sessions held outside the students’ 
classrooms on consecutive days. Half of the groups 
heard the two traditional role stories, while half heard 
the two nontraditional role stories. After each story, 
students filled out a questionnaire asking what they 
liked and didn’t like about each story, since ostensibly 
this was the experimenter’s interest. The day after 
each group had heard the second story, the classroom 
teachers had the students fill out the first adjective and 
job checklists in class without the experimenter's 
presence. The experimenter then met with each stu- 
dent individually to administer the picture-choice test. 
The classroom teachers gave the second adjective and 
job checklists a week after the first. 

The experimenter administered the job and adjective 
checklists to the control group. Students were told that 
the experimenter had given the checklists to fourth 
graders in another school and wanted to see how it 
compared with this school. 


Results 


The results generally supported the pre- 
diction that girls in nontraditional story 
groups would select more nontraditional 
pictures, jobs, and adjectives than girls in 
traditional story groups. On the picture- 
choice tests, nontraditional girls picked sig- ~ 
nificantly fewer traditional pictures than 
traditional girls (5.1 out of 8 compared with 
6.2). Using a one-tailed test of significance, 
t(27) = 1.741, p < .05. 

On the job checklists, there was a slight 
tendency for the traditional girls to rate the 
female jobs more positively, t(8) = 2.06, p < 
.05, one-tailed test. This effect was not ev- 
ident on the second testing. There were no 
differences in the ratings of male jobs. 

Words on the adjective checklist were 
classified as male, female, or both on the 
basis of the ratings made by the control 
group. A numerical score for each adjective 
was obtained by giving 2 points for each child 
who rated the word as the control group had, 
1 point for every both rating, and —2 points 
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for every rating opposite that of the control 
group. The scores for each adjective were 
then summed for each group. 

On both checklists, the nontraditional 
females’ ratings of the adjectives were less 
stereotyped: first list, ¢(30) = 3.19, p < I; 
second list, £(16) = 1.85, p < .05, one-tailed 
tests. 

On the picture-choice test, males who 
heard nontraditional stories picked slightly 
more traditional pictures than males who 
heard traditional stories, contrary to pre- 
diction. This difference was not significant, 
t(26) = .28, ns. There were no differences 
between the groups on the adjective check- 
lists. The results of the job checklists were 
judged to be irrelevant for males, since they 

reflected the boys’ attitudes toward the jobs 
and not toward females in these jobs. 


Discussion 


The results of this experiment suggest the 
importance of the type of role models girls 
see in the books they read. Girls in this 
study who heard stories with women in tra- 
ditional roles showed a clear tendency to 
make more traditional, stereotypical re- 
sponses than girls who heard stories involv- 
ing nontraditional women. They preferred 
pictures of women in traditional roles and 
indicated that traditionally female jobs 
sounded more appealing than nontraditional 
jobs. Girls who heard stories about non- 
traditional women were more likely to pick 
nontraditional jobs. 

The presence of a female experimenter 
may have contributed to these results. She 
served as a live role model who could have 
been interpreted as supporting or repre- 
senting the women in the stories she was 
reading. Since she could be regarded as an 
authority figure, this would give more weight 
to the viewpoints presented in the stories. 
The observed effect might not have been so 

great if the children had read the stories 
themselves. 

The setting of the experiment ma 

have had an effect. The children UM 
lowed to leave their classrooms and their 
work during the school day to hear a story. 
This change in their usual routine may have 
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made them particularly receptive to whg 
ever material was presented. i 

The magnitude of the results is somew 
surprising, considering the limitations of f 
experiment. Students in the experimen 
groups received only two treatment sessions, 
They heard only two stories lasting about! 
minutes each. One might have expect 
that no effect would have been shown over 
such a short period of time. In contrast, 
Litcher and Johnson (1969) exposed stu- * 
dents to multiethnic readers for a period of 
4 months. As the experimenter read the 
stories and administered the picture-choice 
tests, she may have unknowingly influenced 
the results. However, a different person 
administered the adjective checklists in a 
different setting, and these results 
support the hypothesis. Thus, the overall 
effect cannot be explained solely on the basis 
of experimenter bias or experimental de- 
mand characteristics. 

The results of three of the five posttests 
were significant for females, and the results 
of the others were in the predicted direction 
in spite of the brief treatment conditions: 
This suggests that the effect of sexually 
stereotyped children's books on girls overa 
long period of time can be major. Surveys 
of current children’s literature suggest that 
this is indeed the type of book that is gen- 
erally available to girls. It can no longer De 
suggested that such books will have little | 
impact on the girls who read them. Further 
research should examine the effect of more 
prolonged treatment conditions on the aby) 
titudes of girls to more closely approximate 
the actual effect of reading such books for 
years. 

Further research needs to be done on thé 
effect of sexually stereotyped books p^ 
males. It may be that nontraditional stories 
would widen males’ perceptions of what 
constitutes appropriate behavior and char- 
acteristics for females. This idea sho 


bias in children’s books and textbooks. Ont 
cannot expose girls to sexist b00 
throughout childhood and then grant e" | 
a “free” choice of the role they want 88 "d 
adult because such choices will not actually 
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befree. If girls have been told by books all 

“their lives that the role of mother is the 
highest to which they can aspire, they are not 

likely to consider the role of doctor or car- 
penter as a real option. By widening the 
range of options available to girls and women 
in children’s books, one may hope to widen 
the range of options that girls will consider 
appropriate for themselves. 


| Reference Note 


1, Stull, M. "Isn't that just like a girl!” The sex typing 
. of girls and women in children's literature. Un- 
published manuscript, Kirkland College, 1973. 
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One hundred fifty-four children who were in either the second or fifth grade 
were asked questions concerning the expectations of their friends and teacher 
regarding their behavior as students. A measure of sociometric status was 
also obtained. It was hypothesized that role variables condition the relation- 
ship between achievement motivation and scholastic performance. In sup- 
port of the hypothesis, a positive relationship between a socially based mea- 
sure of achievement motivation and scholastic performance was found for 
upper graders who experience low role conflict and also for upper graders who 
are high in sociometric status. It was also found that among upper graders, 
autonomously based achievement motivation relates most strongly to the per- 


formance of children low in sociometric status. 


One of the puzzles in the literature on 
the relationship between achievement mo- 
tivation and performance is the lack of con- 
sistent positive findings. Although it seems 
that the energy and persistence which 
characterize pupils who have strong moti- 
vation to achieve would lead to better school 
grades, research findings have been mixed. 
Some studies have found support for the 
relationship (Cox, 1962; Ricciuti & Sadacca, 
1955), and others have found no effects 
(Feld, 1960; Shaw, 1961). Atkinson and 
Raynor (1974) recently summarized the 
status of this relationship as a “now you see 


it TN you don't" phenomenon (pp. 200- 
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Researchers have expected to find positive 
motivation-performance correlations on the 
assumption that an individual’s motive to 
achieve is a unitary characteristic that is 
engaged whenever he or she is challenged to 
perform to some criterion of excellence, 
whether the challenge involves a brief task 
in a laboratory, the scholastic curriculum, 
athletics, or whatever. But a number of re- 
searchers have questioned the unitary nature 
of the achievement motive and suggested 
that its evocation depends on several prop- 
erties of the stimulus situation in addition to 
its achievement challenge (see Crandall, 
Katkovsky, & Preston, 1960; Maehr, 1974; 
Veroff, McClelland, & Ruhland, 1975). The 
present study attempts to shed some light on 
this matter by specifying certain additional 
types of variables that need to be consider 
in predicting a positive relationship betwee" 
achievement motivation and the scholasti¢ 
performance of elementary school children. 
These variables relate to the social condi- 
tions in a classroom and the developmental 
stage of the achievement motivation.  , 

Veroff’s (1969) distinction between sit- 
uations in which achievement standards are 
based on bettering one's own previous pe 
formance and situations in which self-othet 
comparisons are dominant is potentially 


valuable in clarifying motivation-perfor ! 
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relationships in children. According 
roff’s developmental theory, autono- 
oriented achievement motivation 
ps early and is the primary source of 
ievement motivation among elementary 
children in the early years. The in- 
g social awareness of the child as well 
the demands of the typical classroom 

the development of the self-other or 
al comparison orientation, which in- 
ses in strength until around the fifth 
le. Previous work (Feld, Ruhland, & 
|, in press; Veroff, 1969) has found em- 
ical support for this developmental 
el. More important for the present 
, Feld et al. also found that scholastic 
ormance was differentially predicted by 
two types of achievement motivation for 
er and older children in elementary 
. Specifically they found moderate 
tionships between ^ autonomous 
evement motivation and scholastic 
formance for second graders (y = .37, p 
2, and y = .12, ns, for report card grades 
d standardized achievement tests, re- 
ively) and between social comparison 


id for positive relationships between 
nomous achievement motivation and 
ormance for fifth graders (y = .23, p € 
and y = .19, p < .10), but not for the re- 
nships between social comparison 
evement motivation and performance 
ond graders. Thus specification of 
level and type of achievement motivation 
ems to have improved somewhat the pre- 
tion of a positive motivation-performance 
lationship, and we will maintain these 
ications in the present study. But 
n the weakness of these relationships 
the historical tradition of *now you see 
you don't," we are interested in the 
ent study in whether there are social 
hological variables in the classroom 
further condition the motivation- 
nance relationship so that it holds for 
lin children but not for others. 

ür conceptualization of social conditions 
classroom is rooted in a role theory 
f that situation. It is assumed that 
en in classrooms occupy the position 
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of student and that there are attendant role 
demands from other students and the 
teacher that contribute to each child’s defi- 
nition of appropriate role performance in 
that position. The present study investi- 
gates two types of problems a child may en- 
counter in the student role—role conflict and 
lack of acceptance, or social isolation. The 
guiding hypothesis for this study is that 
these problems interfere with the realization 
of the child’s achievement motivation in 
school performance. 

Role conflict exists to the degree that there 
are incompatibilities among the various so- 
cial pressures exerted upon the individual to 
believe, to experience, and to behave in cer- 
tain ways appropriate to his or her social 
positions. A major potential for conflict in 
the student role is that students receive 
messages about that role both from the 
teachers and from classmates. Student role 
conflict should exist to the extent that the 
teacher’s view of the student role is coun- 
tered with opposing demands by friends. 
Prior research suggests that such conflict is 
likely to occur when teachers value scholastic 
effort and achievement while friends nega- 
tively sanction them. 

The potency of role conflict to condition 
scholastic achievement has been amply 
documented among high school students (cf. 
Coleman, 1961; Gordon, 1957) with research 
that shows that in many schools peer norms 
are counter to scholastic achievement, and 
they do reduce students’ efforts and levels of 
performance. One may expect similar 
findings among elementary school pupils 
even though athletic and social concerns 
important to high school students are not so 
well institutionalized in the primary grades. 
However, generalizations from high school 
to elementary school students may not be 
valid: Identification with peers seems to be 
less strong among younger children, who 
seem relatively more attached to the values 
of parents and teachers (Long, Henderson, 
& Ziller, 1967), and the elementary student 
subculture may not be as incompatible with 
scholastic norms as the high school student 
subculture seems to be. On balance, though, 
we expect that such role conflict as elemen- 
tary school children do perceive between 
their friends’ and teachers’ definitions of 
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appropriate student role behavior should 
interfere with the engagement of achieve- 
ment motivation in scholastic performance. 
The negative consequences of role conflict 
do not appear to us to be differentially linked 
to the use of autonomous or social compari- 
son achievement standards of excellence, 
and we would therefore expect the engage- 
ment of either type of achievement motiva- 
tion in scholastic performance to be reduced 
under conditions of high role conflict. In 
keeping with earlier findings of the greater 
salience of autonomous achievement moti- 
vation for younger children and social com- 
parison achievement motivation for older 
children, we do, however, expect that role 
conflict will differentially condition the 
motivation-performance relationship for 
younger and older children. 

Social acceptance by one's classmates is 
assumed to be an aspect of total student role 
performance that can affect the motivational 
links of the scholastic aspect of that role 
performance. Social acceptance or isolation 
may be an important determinant of 
whether social comparison achievement 
motivation becomes attached to school ac- 
tivities. In Veroff's (1969) model, the typi- 
cal or perhaps ideal case is one in which the 
social development of the child and the in- 
creasing opportünities for social interaction 
that occur during the early elementary 
school years foster a movement away from 
a total reliance on evaluation of success in 
terms of one’s own previous levels of at- 
tainment to a comparison of outcomes with 
members of an appropriate reference group, 
for example, classmates. One important 
f unction of social comparison is that it pro- 
vides children with information about their 
relative strengths and weaknesses. For 
popular children who are secure in social 
relationships, this information becomes part 
of the way they view themselves and will 

eventually help them as young adults to 
make decisions about the kinds of special- 
ized roles for which they are best suited. For 
the unpopular child, however, social com- 
parison information can be threatening. 
The f unpopular child may be the class 
“brain” or the “dunce.” In either case, we 
suspect that a failure to achieve peer accep- 
tance will make the social comparison pro- 
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cess anxiety arousing, and therefore, 
comparison achievement motivation 
more likely be linked to scholastic perfor- 
mance under conditions of peer accepi 
than peer rejection or isolation. Given thai 
social comparison achievement motivation 
has been found to relate to the scholastic 
performance of older, but not younger, ele- 
mentary school children, this conditioning 
effect of social acceptance or rejection js | 
expected to be clearer for older children, - 
The potential effect of lack of social ac- - 
ceptance on the relationship of autonomous 
achievement motivation to school perfor- 
mance is somewhat more ambiguous. On 
the one hand, we expect that any kind of 
problem in the student role will have adverse 
effects on the motivation-performance re 
lationship, and autonomous achievement 


under conditions of peer acceptance than 
peer rejection or isolation. On the other 
hand, rejection or isolation from peers may 
force children to rely to a greater extent on 
their own internal standards and to use the 
teacher rather than peers for social approval. 
If the latter is the case, autonomous 
achievement motivation should be related 
to school performance when peer acceptance ' 
is absent. A 
With these considerations in mind We 
formulated the following hypotheses as 
guides for analyzing the data on the effect ol 
role problems on the relationship between 
achievement motivation and scholastie 
performance. i 
- Hypothesis 1. Achievement motivation 
is expected to show reliable positive rela 
tionships to scholastic performance under 
conditions of low student role conflict. 
effect will be most evident when second 
graders' autonomous achievement motivé: 
tion and fifth graders’ social comparison 
achievement motivation are involved. , 
Hypothesis 2. Social comparison 
achievement motivation is expected to show | 
reliable positive relationships with schola 
performance under conditions of high pee 
acceptance (popularity). ‘This effect will be 
most evident for the fifth graders. 
Although no hypothesis is made E 
cerning the effect of popularity on the rei 
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tionship of autonomous achievement moti- 
^ vation to school performance, these effects 
_ will be reported separately for younger and 
older children and interpreted in light of the 
opposing arguments made above. 


Method! 


Subjects 


The subjects were elementary school children who 
were in either the first (n = 82) or fourth (n = 72) grade 
at the initial testing period and in the second or fifth 

| grade at the final testing period. Because they partic- 
ipated in a larger research program on social psycho- 
logical variables in the classroom, the children were 
from two schools, one predominantly black and the 
other about two thirds white. This racial mixture was 
related to hypotheses in the larger study and will not be 
of concern here (see Gold, Feld, & Ruhland, 1976; 
Ruhland & Feld, 1977). At Time 1, the average age was 
84 and 122 months, and mean grade equivalents on 
vocabulary subtests of standardized achievement tests 
were 2.10 and 3.83 for the younger and older groups, 
respectively. 


Procedure 


At each of three times of measurement, children were 
seen in one or more individual testing sessions. An 
experimenter took each child to a special testing room 
and introduced the study as being about what children 
doin school and how children feel about different things 

that happen in school. 

, At Time 1 and Time 2 each child was seen in one in- 
dividual session, which included a storytelling measure 
of achievement motivation. At Time 3, two individual 
lesting sessions were required because of the inclusion 
of the role problems measures. The first session in- 
cluded the sociometric measure and the child’s per- 
feptions of the teacher's attitudes toward various 
school-related behaviors. During the second session 

the Storytelling measure of achievement motivation was 
Administered and the child’s perceptions of his or her 
fiends’ attitudes toward the school-related behaviors 
Were assessed. 


Instruments 


Fantasy measures of achievement motivation. 
\utonomous and social comparison achievement mo- 
"ation was assessed using a projective measure based 
0n the standard projective measure (McClelland, At- 
wor, Clark, & Lowell, 1953) but distinguishing be- 
ace autonomous and social comparison achievement 
‘ct: The scoring distinction is fully described 

Where (Feld et al., in press). This fantasy measure 
x Captured developmental changes in the relative 
ti; ence of autonomous-social comparison standards 

milar to those reported by Veroff (1969) for his level 

“Spitation measures, 
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Preliminary analyses using the motivation measures 
from the three testing periods and achievement mea- 
sures from Times 1 and 3 indicated that there was little 
relationship between the motivational measures and 
either academic achievement tests or report card grades, 
Since the range of possible scores on the measures of 
motivation was very small, and since such scores can be 
greatly affected by immediate experiences of success or 
failure (see McClelland et al., 1953), the scores were 
combined for each type of measure across all three times 
in order to obtain more reliable indices. This resulted 
in the following two indices of achievement motivation: 
number of stories scored for autonomous achievement 
imagery (range 0-4), and number of stories scored for 
social comparison achievement motivation (range 
0-9). 

Measures of scholastic performance. Two interre- 
lated measures of school performance were used, report 
card grades and scores on standard achievement tests. 
The report card grades used were the final grades in 


reading and arithmetic issued to the pupils at the end 


of the second and fifth grades. While the grade dis- 
tributions in the two schools were similar, they were 
different enough that we decided to standardize the 
grades within school. These arithmetic and reading 
standardized scores were then summed to obtain a 
single index of school performance measured by report 
card grades. 

The standard achievement tests were administered 
in several sessions to each classroom as a group, by the 
teachers when it was part of the school testing program 
and by members of the research staff when it was not, 
Whenever possible, pupils who missed being tested 
because they were absent were tested alone or in small 
groups when they were available later. Accommodating 
to the testing practices of the schools, we employed the 
Iowa Test of Basic Skills (Lindquist & Heironymous, 
1973) at the end of the fifth grade and the Metropolitan 
Achievement Test (Prescott & Balow, 1970) at the end 
of the second grade. We selected subtest scores of 
verbal and arithmetic skills from each test and summed 
standardized versions to obtain the measures of scho- 
lastic performance. For the lower graders the word 
knowledge and mathematics subscales were used, and 
for the upper graders the vocabulary and arithmetic 
concepts subscales were used. b 

As expected, there were substantial correlations be- 
tween report card grades and achievement test scores, 
The correlations ranged between .52 and .63 and were 
all significant (p < .001). 

Student role conflict. Pilot work with a large battery 
of items descriptive of students' behavior led to the 
selection of 19 items (see Table 1) that describe students 
both positively and negatively. Childrens’ descriptions 
of the kinds of student role demands they received were 
obtained by asking them to guess how they thought 
their teachers and friends would feel if a child behaved 
as described in these items. ; - 

During the first individual testing session at Time 3, 
a posterboard was set up on a table before the child. In 


1 Detailed descriptions of the subjects, procedures, 
and instruments are available in Gold, Feld, and Ruh- 


land (1976). 
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Table 1 : i 
Percentage of Children Responding Either 
“Very Happy” or “Very Unhappy” to Student 


Behavior Items __ 


Items* Teacher Friend 
Uses bad words around school 83 63 
Is quiet while teacher is talking 81 63 
Is a good reader 79 73 
Cheats on a test 74 57 
Has good ideas 69 52 
Tries hard 63 51 
Sings very well 61 43 
Laughs at others' wrong answer 60. 40 
Hits other children 5T 44 
Does well in gym 55 52 
Waits a turn in line 54 57 
Gets lots of answers wrong 54 44 
Lazy at doing schoolwork 52 36 
Doesn't finish schoolwork 48 31 
Bothers busy teacher 46 34 
Has a lot of friends 42 49 
Gets to school on time 37 49 
Isn’t good at arithmetic 37 29 
Acts silly 31 36 


a The items have been slightly shortened for presenta- 
tion in the table. 


horizontal array at about eye level were five round faces, 
each face with two straight lines to depict eyes. The 
mouths of the faces varied to suggest different emotions: 
From left to right was a mouth-line curved symmetri- 
cally and radically downward; a line curved moderately 
downward; a straight line; a line curved moderately 
upward; and a line curved radically upward. Under 
each face was an inscription to reinforce the emotion 
depicted: “very unhappy,” “a little unhappy,” “does 


not care,” “a little happy,” and “very happy.” Beneath ` 


each inscription was a cut out opening big enough to 
pass through a 3 X 5 in. card. The child was told to 
pretend that the faces were those of a teacher, and a 
sign, “A teacher,” was at that point attached to the 
posterboard above the faces. The tester told the child 
that some things that children do “make a teacher ‘very 
unhappy’ or *a little unhappy,’ ” and so on, reading the 
inscriptions beneath the faces and pointing to each one. 
(“Does not care” was also described as a teacher who “is 
just looking” in order to provide the child with the 
further opportunity to reject a behavior item as irrele- 
vant to the teacher’s view of the student role.) 

The game was then explained to the child as one in 
which he or she guessed how a teacher would feel about 
some things that boys and girls do in school. To insure 
that the child understood the distinctions among the 
five reactions depicted by the faces, he or she was then 
asked a series of open-ended questions about the kinds 

of things a child might do in school to make a teacher 
very happy, a little happy, and so on. In all but a few 
cases it was apparent that the children understood im- 
mediately the nature of the game; in instances when the 
child’s responses were inappropriate, for example, 
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having nothing to do with children’s behavior, the 
structions were repeated and explained some morel 
fore proceeding. Then the tester moved behind 

board, out of the child’s sight. E 

At this point the tester began to read the 19 items 
behavior to the child, handing a 3 X 5 in. card with the 
item printed on it around the board to the child after 
having read it and instructing the child to put the card 
through the slot in the posterboard underneath that face 
that he or she guessed most looked like how the teacher 
would feel if a child behaved that way. The tester re- 
corded the child’s responses from the back of the 
board. [ 

At the second individual testing session at Time 3, the 
procedure was repeated, except that the child was asked 4 
for descriptions of how “your friend” would feel instead 
of how “a teacher” would feel about the behaviors, Two 
different orders of presenting the 19 behavior items 
were used for “a teacher” and “your friend,” and these 
two were alternated randomly among the pupils and the 
task. Item order turned out to make no difference in 
distributions of responses or in tests of selected rela- 
tionships and so was ignored in later analyses. 

All of the children seemed to understand the proce: | 
dure and to respond meaningfully to the instructions, 
No one’s data had to be discarded in toto, and only 5 
descriptions of the teacher's view and 11 of the friend's 
scattered among several subjects became missing data 
because of recording failures. Appropriately positive 
or negative correlations between responses to positively 
and negatively phrased behaviors also indicated that / 
the method was effective in getting children’s descrip= 
tions of their roles. 3 

As can be seen in Table 1, the children for the mosi 
part perceived that teachers and friends would regard 
the various behaviors with the same relative importance, 
As an index of a behavior’s importance, we used i 
proportion who guessed that a teacher or friend woul 
be either very happy or very unhappy (whichever 
larger) and compared the rank orders of items on 
index (p = .74,p <.01). But the data also show that o 
the whole there was less consensus about how friends 


the items eliciting 60% or greater consensus Al 
teachers' feeling and only three about friends' 
ings. a 
With a view to developing an index or indices of rol A 
conflict based on the congruence of the children's d 
scriptions of how their teachers and their friends Wo 
feel about the behavior items, a series of factor ana 
were run on the responses to the two parts of the 
We found no evidence of clustering of teachers" 
friends’ expected reactions to items concerning 
achievement behaviors (e.g., is a good reader) separates 
from items concerning deportment (e.g. hits oth? 
children) or items of effort (e.g., tries hard). There 
minimal clustering of positive and negative behav! 
into separate factors, but the differential loadings 
items on the factors were not great when orthogone 
rotation was employed. It was therefore decided to us 
all 19 behavior items in a single index, a strategy ie 
required the fewest assumptions or manipulations of 
raw data. The sum of the item-by-item discret 
scores between teachers’ and friends’ expected hs 
yielded the total conflict index. Inasmuch a5! 
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Achievement Motivation to Student Achievement Variables at Three Levels of Role Conflict 


Lower grade 


Upper grade 


Report card 
grades 
Role 
conflict Y n Y 


Achievement 
test scores 


Report card 
grades 


Achievement 
test scores 


n x n Y n 


Table2 — 
1 (Gamma) of Cross- Wave Measure of Autonomous and Social Comparison 


Autonomous achievement motivation 


High rte 14 24 15 22 23 28 

] a : 24 
Medium 31 25 0 27 .02 18 16 21 
Low 29 22 .12 23 3T* 21 3 27 

Social comparison achievement motivation 

High 24 14 32 32 AT 23 21 24 
Medium 18 25 12 27 18 18 26 21 
Low 01 22 = 16 23 A3** 27 36** 27 
*p<.05. 

p< Ol. 

"5 p « (001. 

nalysis of the discrepancy scores indicated that each Results 


em discrepancy correlates significantly (p € .05) with 
e total index, the decision to include all items seemed 
tgitimate. 
The range of the teacher-friend conflict index is from 
ny 48 out of a possible 0 to 76, with a higher score in- 
licating a greater discrepancy or role conflict. The 
index was divided into thirds within grade, and the 
thildren were rated as high, medium, or low in role 
'onflict. 
Popularity-social isolation. During the first indi- 
'idual session at Time 3, children were asked initially 
b name as many of the children in their class that they 
wuld remember. The tester reminded the child of any 
assmates who were omitted and corrected false in- 
“usions. Having thus established a classroom list, the 
“ster then asked, “Now, thinking of all the children in 
Di ass, whom do you most want to be good friends 
? and prompted with “Anyone else?” for up to 
i a nominations. This procedure was repeated with 
i questions, “Which ones in your class would you most 
TA play with during recess?” and “If your teacher 
i d some children to work together on something 
“lass, whom would you most want to work with?" 
n sue of a child's popularity was the total 
la er of times he or she was named by classmates in 
nse to the three questions divided by the number 
fe nates. The less popular the child was, the 
Ns an was the assumed social isolation. This index 
h 50 divided into thirds within grade, and the chil- 
Past rated as high, medium, or low in popular- 
s ial isolation. Sociometric measures of this type 
un used frequently with young children and 
URS reliability during the course of the school year 
ih & Gold, 1959) and between school years 
way, 1968). 


Each of the motivational indices was cor- 
related with achievement test scores and 
report card grades for each of the three levels 
of the measures of role problems, separately 
for younger and older children? The 
gamma statistic (y) was used because the 
motivational measures had a small range and 
included a large number of ties (see Hays, 
1963, p. 655). 


Hypothesis 1 


Achievement motivation is expected to 
show reliable positive relationships to 
scholastic performance under conditions of 
low student role conflict. This effect will be 
most evident when second graders' autono- 
mous achievement motivation and fifth 
graders’ social comparison achievement, 
motivation are involved. The results in 
Table 2 support this hypothesis among fifth 
graders for the relationship of socially based 


2 The results were unchanged when partial correla- 
tions were performed that controlled for the influence 
of the length of the stories that showed positive corre- 
lations with both the motivation and performance 


measures. 
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Table 3 
Relationships (Gamma) of Cross-Wave 
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Measure of Autonomous and Social Comparison 4l 
evement Variables at Three Levels of Popularit. 


Achievement Motivation to Student Achi p ) 


Lower grade Upper grade 
Report card Achievement. Report card Achievement 
Side test scores grades test scores 
n Y 
Popularity Y n Y n Y ñ 
Autonomous achievement motivation | 
High .40* 23 45 24 06 21 =; 14 22 
Medium 20 20 12 22 23 27 E : 29 | 
Low dvi? 16 28 17 .53** 18 34 19 
Social comparison achievement motivation 
38** 22 
i 28 23 15 24 .65*** 21 : 
Sein 06 20 -43 22 40** 21 29 29 
Low .90 16 05 17 =.19 18 21 19 
* p € 05. 


achievement motivation to both report card 
grades and the achievement test: The only 
significant relationships between scholastic 
performance and socially based achievement 
motivation are for children who perceive 
little difference between the role demands 
for the teacher and from friends. When the 
autonomously based measure is used, the 
findings for fifth graders are less clear; the 
only significant relationship, however, is 
between autonomously oriented achieve- 
ment motivation and report card grades 
under conditions of low role conflict, which 
is consistent with the hypothesis. 

The results for the second graders are 
contrary to the hypothesis. Although the 
motivation-performance correlation is sig- 
nificant in only one emphatic instance, the 
relationships between motivation and per- 
formance tend to be higher under conditions 
of high rather than low role conflict. 


Hypothesis 2 


Social comparison achievement motiva- 
tion is expected to show reliable positive 
relationships with scholastic performance 
under conditions of high peer acceptance 
(popularity). This effect will be most evi- 
dent for the fifth graders. "This hypothesis 
is confirmed for fifth graders for whom there 


are strong positive relationships between 
social comparison achievement motivation 
and school performance, especially report 
card grades, only when popularity is mod: 
erate or high (see Table 3). Although not 
significant, the results for the second grader 
are in the expected direction. m 

The effect of social acceptance or rejectiol 
on the relationship of autonomous achievé 
ment motivation to school performance! 
also shown in Table 3. Among fifth grader 
significant relationships between autond 
mous achievement motivation and bo 
measures of scholastic performance welt 
found only when popularity was at low 0 
medium levels. Among second graders th 
strongest relationships between autonomo 
achievement motivation and performan 
are found among the least popular childre 
The general pattern of the results for t | 
second graders, however, manifests som 
inconsistencies, for example, the signific? 
relationship between autonomous achiev! 
ment motivation and report card grace? 
the highly popular children. 


Discussion 


01 

The hypothesis that role problem i 

dition the relationship between mo pp 
and performance receives strong $ 


ng fifth graders when the social com- 
gon measure of achievement motivation 
nployed. When there is high role con- 
as assessed by lack of congruence be- 
n one's view of your teacher's and 
ds’ views of the student role, there is an 
nce of significant relationships between 
vation and performance, whereas when 
conflict is minimal, there are significant 
motivation-performance relationships. 
Analyses of the independent effect of 
iends’ views of the student role indicated 
it is conflict between teacher and peer 
pectations rather than merely a lack of 
ier support for positive school behaviors 
conditions the motivation-performance 
ionships. Role problems of lack of peer 
eptance are also associated with an ab- 
ence of significant relationships between 
l comparison achievement motivation 
id performance among the older children. 
When peer acceptance is high, however, 
ere are strong positive relationships be- 
n social comparison achievement mo- 
ation and performance. 

he relationship among fifth graders of 
lonomous achievement motivation to 
olastic attainment is also conditioned by 
rejection: The strongest relationships 
ur for the children who are least popular 
their classmates. This pattern suggests 
at the potentially debilitating effects of 
er rejection on scholastic performance may 
ushioned by a motivational orientation 
d on comparisons with one's own pre- 
S performance. In other words, among 
en who do not have close social ties to 
classmates, those who strive for excel- 
n relation to their own past perfor- 
ce are more likely to succeed scholasti- 
than those who are lacking this self- 
ted motivation. 

1 e results for the second graders are 
ling. When children are grouped ac- 
ing to how much conflict they perceive 
the student role, the data suggest that the 
Otivation—-performance relationship is 
jongest among children who perceive a 


£ 


à t was to suspect that for second graders 
iW role congruence was based on the veri- 
Perception that teachers and students 
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do have somewhat different expectations for 
role performance. If this were true, the 
children who perceived low congruence 
should have demonstrated greater intellec- 
tual or social maturity on other measures. 
Results from the standard achievement tests 
and the Children’s Social Desirability Scale 
(Crandall, Crandall, & Katkovsky, 1965) 
failed to indicate such differences. The 
explanation for these findings must await 
further work (preferably longitudinal) on the 
development of role conceptions and their 
effect on the relationship of motivation to 
performance. 

There is some weak evidence among the 
second graders that social rejection may 
condition the motivation-performance re- 
lationship in a manner consistent with the 
hypothesis. One possible explanation for 
the weak and inconsistent findings among 
second graders may be that social compari- 
son processes are really only beginning to 
emerge so that at all levels of popularity the 
relationships between social comparison 
achievement motivation and the perfor- 
mance measures are quite weak. Similarly, 
the fact that autonomous standards are to a 
large extent the only achievement standards 
at this age might tend to obscure the role of 
social psychological variables in conditioning 
the motivation-performance relationship. 

A final word should be said about the 
middle sociometric group. Veroff’s (1969) 
theory describes a final, integrated stage of 
achievement motivation in which autono- 
mous and social standards are both impor- 
tant for the individual. The fact that both 
autonomous and social comparison 
achievement motivation are related to per- 
formance for the moderately popular group 
of fifth graders may suggest that they are 
moving toward this integrated stage of 
achievement motivation. Future research 
should focus on the dynamics operative for 
this group. We suspect that they are neither 
barred from the comparison process by iso- 
lation and anxiety (as perhaps are the un- 
popular children) nor overwhelmed by it as 
part of a generally strong social orientation 
(as perhaps are the highly popular chil- 
dren). 

The impetus for this research was the in- 
consistency in the literature on the rela- 
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tionship of motivation to performance. The 
results of this study suggest that this rela- 
tionship holds only under certain conditions. 
The simultaneous consideration of children’s 
age, role problems, motivation, and perfor- 
mance indicates that the presence or absence 
of student role problems may be an impor- 
tant conditioner of whether a child’s 
achievement motivation is realized in school 
performance, and its potency is greater 
among older elementary school students. 
This analysis of achievement motivation 
and scholastic performance has moved be- 
yond previous research by suggesting why 
relationships between these variables are 
often small or nonexistent and under what 
conditions one might expect the relationship 
to hold. Although the implications of these 
findings for the classroom teacher are only 
speculative at this point, we would like to 
raise two issues for the consideration of 
teachers and for future research. 

1. As teachers are often aware, their ex- 
pectations for school behavior may be in 
conflict with peer norms. In cases where 
this conflict is marked, it appears to be dif- 
ficult for even highly achievement motivated 
children to put their effort into their school- 
work. Teachers might attempt to identify 
the areas in which there are conflicts be- 
tween their attitudes and the attitudes of 
their students toward schoolwork and be- 
havior and attempt to reduce this conflict, 
since our data suggest this could result in the 
children investing greater effort in their 
school activities. 

2. Our data suggest that highly popular 
children with concern about autonomous 
standards of excellence may not connect this 
motivation to schoolwork, whereas unpop- 
ular children with such internal standards 
may effectively engage in school achieve- 
ment. In contrast, popular children with 
strong socially oriented achievement stan- 
dards do seem to do better in schoolwork 
than those deficient in these standards. In 
order to sustain both kinds of pupils’ scho- 

lastic motivation, teachers should be aware 
of these different motivational bases and try 
to appeal to both, This dual strategy may 
also advance the pupils’ maturation by 
helping them to develop an “integrated” 
orientation. If teachers emphasize the 
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achievement satisfactions possible from both 
autonomous and social comparison stan-f 
dards and encourage children to strive in 
relation to both, the children may more 
likely become motivated to excel in a wider 
variety of achievement situations, 
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Individual Differences Model A 
and Evaluation of Large 


R. Michael Latta 
University of New Hampshire 


Mark Grabe 
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‘This study developed an approach to evaluating college instruction based on 
individual differences among students. Data for 99 males and 92 females 
paced testing method were compared with data for 102 males 
and 92 females taught by the traditional method. Data from both the cogni- 
tive and affective domains are presented. Results from several modes of anal- 
ysis culminating in a path analysis corroborate contemporary experimental 
ing the determinants of achievement and indicate some ad- 
vantages of the self-paced method, On the basis of these data, suggestions for 
future research and for counseling students are made. 


taught by a self- 


evidence concerni 


Growth in university enrollments has in- 
creased the use of large lecture instruction 
(n > 100), particularly in courses taught 
during the first year, Student populations 
have now stabilized, but restricted operating 
funds necessitate continued use of this for- 

instructional costs at a minimal 


see no reason why information cannot be 
NUT of 300 students as easily 
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plied to Instruction — 
ollege Classes 


Warren D. Dolphin 


Iowa State University 


secondary goals subordinate to efficiei 
formation transfer. Some instructs 
however, are sensitive to these “secondar 
problems. In coping with the pedagogi 
demands of the large lecture section, 
instructor is forced, often unwittin 
blend the differences between individ 
students into the concept of the hypotheti 
average student (HAS). The HAS isi 
pected to follow a rigid schedule of lectur 
reading, and testing, and deviations from: 
schedule set up for the HAS in the cot 
syllabus are not readily accepted. The 
sult is an educational system that 
adherence to a timetable and efficient 
cessing of students through a course. - 
Many educators consider this kind 0 
ficiency to be a dubious goal, but a retur 
smaller class size is not realistic given td 
financial duress in academia. Co 

management innovations in instruct 
systems are being tried in higher educa 
Many educators use computers 
devices to diminish the routine ins 
and clerical duties associated 
lectures. Ostensibly, use of technol 
advancement frees the instructor to in 

with students in smaller groups. This 
not always happen, however, be 
other academic demands placed on PM 
structor or timidity of first-year 5¥ 
Consequently, the HAS concept 
strong influence on the instructoh 
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cently, experiments with alternative in- 
structional systems have been coupled with 
technological developments, The thrust has 
been to replace the HAS concept with a 
theory that focuses on the individual within 

the large group and allows for student dif- 
ferences in educational background, per- 
sonal time allotments, and ability, without 
compromising efficiency or academic in- 
tegrity. 

' One instructional system that focuses on 
the individual student and that is being used 
frequently in higher education is the self- 

paced learning system, Although there are 
many varieties of this approach, certain 
characteristics are common to all self-paced 
systems. These characteristics include (a) 

printed educational objectives; (b) small, 
discrete study units; (c) competence before 
progress to subsequent units; (d) multiple 
opportunities to take tests; (e) nonpunitive 
and diagnostic use of tests; (f) remedial ac- 
tivities keyed to student deficiencies; (g) 
allowances for individual variations in work 
tate (Block, 1971; Bloom, 1968; Dolphin, 

Franke, Covert, & Jorgensen, 1973; Johnston 
& Pennypacker, 1971; Keller, 1968; 
McKeachie, 1973; Postlethwait, Novak, & 
Murray, 1972), 

This last point is not trivial, for it repre- 
‘ents a primary difference between tradi- 
tional and self-paced educational philoso- 
Ee Traditionalists require the HAS to 
learn fixed amounts of material in prede- 
lermined amounts of elapsed time, with 
aptitude defined in terms of the amount 
learned in that period relative to the class 
Werage. That is, testing is instructor-paced, 
and grading is normative. In the self- 
‘Ystem, the elapsed time needed to learn a 

n amount is allowed to vary according to 
ividual student differences. Aptitude is 
‘presented by the time needed to reach a 
Predefined level of performance, and a stu- 
t's level of performance is defined inde- 
Pendently of elapsed time and the perfor- 
Mance of peers (Carroll, 1963). That is, 
ting is student paced, and grading is cri- 

E referenced. epee 

an educational philosophy, self-pacing 
"=phasizes individual student differences, 
evaluations of this approach to instruc- 
generally have been based on average 


961 


affective and cognitive responses of groups. 
Investigators have looked for changes in 
HAS performance and attitude and have not 
attempted to place students in subsets ac- 
cording to educational background, ability, 
or psychological variables in order to deter- 
mine if certain individuals profit or suffer 
under an instructional management strate- 
gy. Yet, unless a panacea is created, one 
would not expect all students to do well 
under one system or the other, Clearly, a 
system designed to provide for individuals 
should be evaluated by looking at individual 
differences among students (Brown, 1971). 
Some researchers have correlated per- 
sonality variables with performance in self- 
paced and traditional systems (e.g., Hohn, 
Des Lauriers, & Deaton, Note 1) or have 
correlated an ability measure such as 
American College Test (ACT) scores with 
performance in a self-paced system (Wood 
& Wylie, Note 2), However, the determi- 
nants of academic performance involve 
student characteristics, properties of the 
instructional system, and especially the in- 
teraction of the two, This article takes into 
account these three influences on student 
performance in large college classes and de- 
velops an evaluation model which empha- 
sizes the individual differences concept, 


Method 
Self-Paced System of Instruction 


Achievement System (PAS) and is designed around 

EN RA M t. Multiple choice ex 
lew 

= compiled a moduler format. corre- 
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cation number and generates master copies of exami- 
nations from a 2,500-entry multiple choice pool. About 
3,000 major and nonmajor students have been in- 
structed by this method. ‘The system is presently being 
modified to include a videotape library of lectures over 
the content of each unit to be available to each student 
upon demand. 


Variables Under Consideration 


Asa pilot study, a small (n = 85) summer school class 
was used to determine if gross differences occur on 
comprehensive final examination performance. A 
single class was offered common lecture instruction but 
divided randomly into two testing groups, self-paced 
and traditional. These groups were given a diagnostic 
test on the first day of class and the same test items as 
a comprehensive examination on the last day of class. 
For both groups, the test represented 15% of the grade 
awarded. The test reliability was high (Kuder-Rich- 
ardson reliability coefficient = .90). No significant 
difference on the mean performance between the two 

groups was detected using any of the following analyses: 
(a) repeated measures analysis of variance using in- 
structional system and student gender as between- 
subjects factors and time of measurement as a within- 
subjects factor; (b) analysis of variance on gain scores; 
(c) analysis of variance on posttest scores only. Thus, 
it appears that there are no differences in group per- 
formance directly attributable to instructional system, 
though individuals may perform better in a self-paced 
system, and the self-paced system was not detrimental 
to the performance of the HAS. Such lack of cognitive 
differences is not uncommon in studies comparing 
traditional and self-paced learning strategies 
(McKeachie, 1973; Morris & Kimbrell, 1972; Silberman 
& Parker, 1974). 

A second study using past course records was con- 
ducted to determine what ability measures predict 
grade earned in freshman biology during the actual 
quarter of enrollment, or after satisfying the conditions 
for removal of an incomplete. The records of 953 stu- 
dents enrolled in PAS and traditional classes were an- 
alyzed in separate multiple regression analyses con- 
ducted independent of the order of the predictors of 
grade. The significant predictors (p < .001) in both 
sections were high school graduation ranking (HSR), 
the Minnesota Scholastic Aptitude Test (MSAT), and 
the American College Test (ACT). The MSAT rather 
than the ACT and the HSR was selected for further 
research because it was available for all students and 
was indicated in previous research to be the better 
predictor of grade in this biology course (Bennett, 1975: 
see also Brown, 1971). i » 

A third study extended the analysis of grade. predic- 
tors by including information of the number of high 
school credits each student had taken in physics, 
chemistry, and biology and measures of. student anxiety 
about tests and perseverance in the course. The 
number of credits in high school physics, chemistry, and 
biology were summed to form a composite variable, ‘high 
school science credits (HSSCI), which was considered 
an index of the students' educational background rel- 
evant to the biological sciences, The importance of 
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educational background was indicated in a previo 
unrelated, study of the relation of high school sci 
experience to grade in freshman biology at our ur 
sity (Bennett, 1975). A measure of perseverance | 
fort) was formed by summing the number of le 
each student reported attending and the number 
study hours each student reported. Within the 
weeks of the course, each student answered a qi 
tionnaire designed to measure anxiety felt in 
situations. This Test Anxiety Questionnaire (T) 
was designed by Mandler and Sarason (1952) and; 
obtained by personal communication in 1971. 
return rate for this questionnaire was 99%. At the end 
of the course, a questionnaire designed by the authors ' 
to measure attitudes toward the method of instruction. 
was given to the students. This questionnaire was in- 
cluded to provide an outcome measure of student 
opinions about the two instructional systems. To 
control for response conditioning, items were counter- 
balanced as negative and positive statements. The 
return rate for this questionnaire was 99% also. (A ei 
of this questionnaire is available upon request.) T] 
the third study was concerned with the effects of 
dent ability, educational background, perseverance, 
anxiety on student performance and attitudes in 
ditional and self-paced instructional systems. 

In this last study, data from 99 males and 92 fe 
enrolled in the PAS section were compared with d 
from 102 males and 92 females from a traditionallyin: 
structed section. Students chose the section. 
wished to attend, but they had no way of knowing whid 
section was PAS prior to registration. The seconda 
thor taught both sections. To make the design q 
experimental, the method of instruction was as 
randomly to the two sections. Both sections used 
same course objectives and were given equal instruc 
as nearly as possible in large lectures taught by the 
individual. The primary difference was in testin 
procedures: The PAS group had at least five op 
tunities to take or retake tests under a criterion 
erenced grading plan, while the traditional group 
three midterms and a final under a norm-referen 
testing plan (Popham & Husek, 1969). Specificg i 
assignment in the traditional section was made on ti 
basis of normative criteria from previously establish! 
frequency distributions of over 3,000 grades for 
course. No more than 1096 of the students ch 
sections or dropped the course during the term, 
there was no clear trend favoring one type of section. 
is hoped that there is minimal bias in the data di 
self-selection on the students’ part, and the res 
should accurately represent the consequences of h 
self-paced and traditional systems of instruction: 

Throughout the results section grade will be ut 
the dependent variable when the determinants of 
formance are analyzed. Unfortunately, di 
parisons of instructional systems on the basis of i 
cannot be made because of the obviously different ai 
teria used in the determination of grade between $ 
tems. Scores on a common comprehensive exam" N 
tion, however, would allow this comparison. 
examination was given, but there was a signifi 
ference in the weight of the results of the examinati? 
between the two systems. In the self-paced SYS 1 
except for students at a boundary between. grades, 
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examination had no weight in determining their grade, 

|* while in the traditional system, the examination result 
comprised 25% of their grade. When asked, over 50% 
of the self-paced students said they did not study for the 
exam, while only 5% of the traditionally instructed 
students said they did not study. Thus, no direct 
comparison of these scores is justifiable because of dif- 
ferential incentives for performance, just as no direct 
comparison of grade earned is possible because of dif- 
ferent grading policies. We might mention, however, 
that the class averages when compared between systems 
were the same on this exam. 


Results 


Results from Group Averages 


Descriptive statistics for male and female 
students enrolled in each instructional sys- 
tem appear in Table 1, Except for grade, 
which is higher for PAS students, there are 

‘ho significant differences between students 
in the two instructional systems. The table 
illustrates that within each instructional 
system, females tend to have greater ability 
and spend more time in studying biology. 
Consequently, one would expect that fe- 


Table 1 

Means and Standard Deviations for Student 
Descriptive Statistics by Instructional System 
—— Án RECH BS 


PAS Traditional 
Measure Mi M F M F 
HSSCIa 
Y M 174 1449 168 151 
DSD 55 66 62 70 
MSAT« 

M 41.76 46.07 43.62 4597 
[SD 1246 — 1113 11.08 10.81 
TAQa 

M 92.67 98.91 ` 93.36 9617 

SD 1949 25.717 2099 2157 
Efforta 

M 49.61 54.49 46.96 5186 

SD 1001 — 1143 1118 1050 
Gradea 

M 3.63 3.58 336 — 3.0 

SD 85 89 98 1.00 


d 99 92 102 92 


i 

x Hssci = high school science credits; MSAT = Minne- 
ine olastic Aptitude Test; TAQ = Test Anxiety Question- 
"RI = Phase Achievement System; M = males; F = 
D de between genders within each system are significant. 
ASSP ay, 
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Table 2 
Statistics Describing Predictors of Grade 
Derived by Univariate Analysis 
PAS Traditional 
GEES al 
Measure M F M F 
a a univ Ton 
HSSCI 
Slope 45^ 31^ 27 44a 
Intercept — 2.82 3.12 2.91 2,54 
Range 0-2.67 0-3.00 | 0-2.07  0-2.67 
R? .08 05 03 .09. 
MSAT 
Slope 048 04" 048 064 
Intercept — 1.99 1.98 147 70 
Range 13-72 17-71 20-68 25-68 
R? 31 20 .25 .35 
TAQ 
Slope —.01 -—.01* —.003 —.02" 
Intercept 4.40 4.71 3.63 4.92 
Range 45-142 39-162 9-169 91-145 
R? -04 12 01 15 
Effort 
Slope 028 .006 .009 —.02" 
Intercept — 2.77 3.29 2.95 4.11 
Range 24-77 31-82 22-76 33-76 
R? 05 O01 01 404 


Note. HSSCI = high school science credits; MSAT = Minne- 
sota Scholastic Aptitude Test; TAQ = Test Anxiety Question- 
naire. PAS = Phase Achievement System; M = males; F = 
females. 

^ Slopes that are significantly different (p < .05) from zero, 
Boldface slopes are significantly different (p « .05) from other 
slopes in the same row except for female traditional TAQ, which 
differs only from male traditional TAQ. 


males would perform better on the average. 
This is not the case, however. Males per- 
formed better in each system despite the fact 
that females are considered more predictable 
than males in academic settings (Gross, 
Faggen, & McCarthy, 1974; Kahn, 1973). 
Apparently female students are under- 
achieving in both sections, and other entries 
in the table suggest that this may be related 
to significantly higher test anxiety and lesser 
background in science for females. 


Results from Univariate Analyses 


In order to confirm that the variables 
listed in the table were, in fact, predictors of 
grade, univariate analysis techniques were 
employed as the next level of analysis. 
Table 2 lists statistics derived from a re- 
gression analysis in which grade was the 
dependent variable. A separate analysis was 


964 


performed for each instructional system and 
gender combination using HSSCI, MSAT, 
TAQ, and effort as single independent 
variables. These were considered to be 
measures of educational background, ability, 
anxiety during testing, and perseverance, 
respectively. Overall, 16 regressions were 
performed. 

Slope values for HSSCI and MSAT indi- 
cated that more able and better prepared 
students in both instructional systems 
earned higher grades. The intercepts for 
MSAT indicated that the low-ability female 
in the self-paced section was performing a 
full grade higher than her counterpart in the 
traditional section. Another interesting 
result is that PAS males of all abilities re- 
ceived one half a grade higher than their 
traditionally taught counterparts. The 
self-paced system is apparently compensa- 
tory, and the results suggest a tendency for 
this group to overcome a lack of ability and 
educational background. Coefficients from 
the regression of grade on TAQ suggest that 

anxiety over exam performance significantly 
hinders the performance of the female stu- 
dents in both instructional systems. 

The slope values from the regression of 
grade on effort yielded an unexpected result. 
Analysis of the performance of females in the 
traditional system reveals a negative rela- 
tionship between grade and effort. This 
result indicates that increased study effort 
is associated with increasingly poorer per- 
formance for females in a traditional in- 
structional system. No negative relation- 
ship was found between effort and grade for 
females in the self-paced system or for males 
in either system. 

In general, Table 2 suggests that the ef- 
fects of the individual predictors of grade are 
consistent with expectations. Educational 
background, ability, and perseverance are 
positively related to performance, while test 
anxiety has a negative influence. The ex- 
ception to this pattern was the negative re- 
lationship between effort and grade for fe- 
male students in the traditional section. 

This level of analysis suggests that self- 
pacing may be a more pedagogically desir- 
able instructional method, and that the 
traditional system of instruction may ac- 
centuate the negative influences of lack of 
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educational background, low ability, and 
high test anxiety for the female student,’ 
The analysis, however, is simplistic. The 
effect of each variable was considered singly 
and not in combination with the others, 
The learning process is enormously complex 
and involves the interactions of these and 
many other variables. Given this complex- 
ity, a better mode of analysis would involve 
methods that consider the effects of the | 
variables in combination. Path analysis was 
chosen as a method for further investigation 
of these data. 

Path analysis is a statistical technique 
whereby variables can be considered simul- 
taneously, and their combined effects on an 
outcome can be analyzed into direct effects 
and indirect effects. An example would be 
the direct effects of anxiety on performance 
and the indirect effects of anxiety, through 
a mediator such as perseverance. To ana 
lyze the combined direct and indirect effects 
of the predictors on performance, zero-order 
correlations and standardized partial re- 
gression coefficients are computed. The 
magnitude of direct effects is determined by 
path coefficients, which are standardized 
partial regression weights (Blalock, 1964) 
The zero-order correlations among HSSCI; 
MSAT, TAQ, effort, and grade within each 
gender and instructional system appear m 
Table 3. The path coefficients are shownit 
Figure 1. Although path analysis allows om 
to choose among many models, the cau 
path shown in the figure was chosen on b 
basis of each variable's precedence in tint 
and the nature of the instructional systemi 
under investigation. 


Results from Path Analysis 


The data in Figure 1 and Table 3 indicat 
that educational background (HSSC!) M 
a significant positive relationship to met 
sured ability in all groups, aS would be? 
pected. The direct effects of education 
background are not surprising, Since j 
logical that better prepared students show! 
be more capable and confident ab 
performance. ! 

"The direct effect of ability on performat 
is positive in both systems regar less 
student gender. The effect of ability 
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EFFORT SoH > GRADE 


A 


m l. Path analysis of determinants of grade 
din the course. (Path coefficients above the line 
r students in the Phase Achievement System 
, and those below are for traditional students. 
are to the left of the vertical line, and females are 
right. Coefficients significant at p « .05 include 
following: HSSCI to MSAT—all except PAS 
; HSSCI to effort—traditional males; MSAT to 
traditional males and females; TAQ to ef- 
PAS males; MSAT to grade—all coefficients; 
to grade—all except traditional females; TAQ to 
—traditional and PAS females. Abbreviations 
follows: HSSCI = high school science credits; 
Minnesota Scholastic Aptitude Tests: TAQ 
Anxiety Questionnaire.) 


verance, however, is different between 
Systems. In the self-paced system, 
ity is not related to perseverance, 
8 in the traditional system, ability has 
lve direct effects on perseverance. 
suggests that the more able students in 
i traditional system do not exert a great 
of effort and rely on their ability to 
lleve a high level of performance. This 
ing strategy leads to a relatively high 
‘of performance by high-ability females 
false perseverance is not positively re- 
performance for females in the tra- 
lal system. The same can be said for 
ales in the traditional system. Even 
sh perseverance is positively related to 
"mance for males in the traditional 
m, the high-ability male still does not 
much time in the learning process. 
ffects of test anxiety on performance 
€ complex than those of ability on 
mance. For males, test anxiety does 
icantly affect performance in either 
lonalsystem. The significant posi- 
Coefficient from anxiety (TAQ) to 
Tance (effort) suggests that test 
às considerable influence on the 
ance (grade) of males in PAS 
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through the mediator perseverance. High- 
test-anxious males in both systems of in- 
struction work harder and achieve a higher 
performance level. In contrast, the direct 
effects of test anxiety (TAQ) on performance 
(grade) for females are negative in both in- 
structional systems. However, the non- 
competitive, compensatory nature of the 
self-paced system appears to allow high- 
test-anxious females to perform at a higher 
level through perseverance. No such influ- 
ence of test anxiety (TAQ) on perseverance 
(effort) occurs in the traditional system. 

Thus, the students in the two systems seem 
to adopt different strategies for dealing with 

the instructional system: Males in the tra- 

ditional system rely on their ability to 

achieve a high grade, while both males and 

females in the self-paced system rely on ei- 

ther perseverance or ability. This suggests 

that the traditional system of instruction 

with its emphasis on educational back- 

ground, ability, competitiveness, and per- 

formance on crucial tests penalizes the 

high-test-anxious female. 


Table 3 
Zero-Order Correlation Matrix Within a 
Gender and System Combination? 


Variable I 2 3 4 5 n 


Phase Achievement System 

1. Grade — 06.28) 121) 01) 99 
2. MSAT 44 — 21 -.09 -—35 93 
3. HSSCI .23 .32 — —01 .04 106 
4. Effort .07 =.20 23 . — .21 106 
5. TAQ —35 -46 -—.14 23 — 106 

n 101 92 105 105 105 — 

Traditional 

1. Grade — .50. AR 29:102 a= .09) 115 
2. MSAT 59  — 27 —.31 —27 102 
3. HSSCI .30 45 — 06 -—.15 115 
4. Effort -.19 —26 06 — A7 115 
5. TAQ —38 —41 —06 -03 — 115 

n 92 86 93 93 93 — 


. With df = 84 (smallest n — 2), the minimum correlation 
pl eh for significance at p < .05 is .22. f MSAT = 
Minnesota Scholastic Aptitude Test; HSSCI = high school 
science credits; TAQ = Test Anxiety Questionnaire. R 
a Correlations for male students appear above the main diago- 
nal, those for female students appear below. 
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Up to this point in the analysis we have 
focused on the effects of the predictors on 
grade as a representative variable from the 
cognitive domain. Outcomes from the 
learning process, however, involve not only 
cognitive changes but also changes in the 
affective domain. As pointed out earlier, we 
attempted to measure the affective domain 
through an attitudinal questionnaire dis- 
tributed to the students. 


Relation of Gender and Instructional 
System to Attitude Reactions to Course 
Format 


A principal-components factor analysis of 
the attitudinal survey yielded clusters of 
items that were highly correlated with the 
questionnaire. Because it was administered 
to students expected to differ in their re- 
sponses to the items (Instructional System 
X Student Gender), the following procedure 
was used to remove the effects of between- 
cells variation on the factor loadings. Sums 
of squares and cross-product matrices were 
computed within cells and then pooled. The 

correlation matrix produced by this proce- 
dure gives the best indication of the interi- 
tem correlations for the total population of 
students responding to the questionnaire. 
The principal-components solution was ro- 
tated by varimax procedures (Kaiser, 1952) 
to simple structure. The rotated factor 
pattern indicated five clusters of items, 
which met the following criteria: (a) 55% of 
the variance among the items was accounted 
for; (b) eigenvalues for the five factors were 
all greater than 1.0; (c) items tended to load 
ona single factor; (d) question content was 
similar. The five factors were named Study 
Habits, General Evaluation, Tests and 
Grades, Perceived Freedom, and Intellectual 
Value. In order to evaluate the affective 
consequences of the instructional systems, 
subscale scores on each factor were con- 
structed for each student. All questions that 
were negatively keyed were reflected prior 
to the formation of subscale scores. 

An Instructional System X Student Gen- 
der multivariate analysis of variance (MA- 
NOVA) was performed employing the sub- 
scale scores as correlated dependent mea- 
sures. This overall analysis of the questions 
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“25109 
4509 AFFECTIVE 
EFFORT TEO ? REACTION 
Olio 
AB 1-27 


Figure2. Path analysis of the determinants of student ‘ 
opinion about the course. (Path coefficients above the 

line are for students in the Phase Achievement System f 
[PAS], and those below are for traditional students, 
Males are to the left of the vertical line, and females are 
tothe right. Coefficients significant at p < .05 include” 
the following: MSAT to affective reaction—all except 
traditional females; TAQ to affective reaction—tradi= 
tional females. Abbreviations are as follows: MSAT 
= Minnesota Scholastic Aptitude Test; TAQ = Test 
Anxiety Questionnaire.) 


indicated a highly significant effect of in 
structional system, F(5, 391) = 14.69, p & 
0001, with the self-paced system being 

evaluated more positively than the tradk 
tional system by both male and female stil 
dents. Subscale analyses using univariat 
analyses of variance indicated that positivi 
attitudes occurred primarily on the Per 
ceived Freedom, F(1, 395) = 10.79, p SS 
.0002, and Tests and Grades factors, F(l, 
395) = 49.15, p < .0001. The same effect 0l 
instructional system was found on the oth 

three factors, but these effects were nU 
statistically significant. 

A path analysis of the factors influendi 
the affective scores was performed. T 
model in Figure 1 was modified by insert 
the sum of the individual item scores fI0 
the questionnaire (affective reaction) 
dependent variable in place of grade 
Figure 2). The earlier portions of the moe 
have been omitted, since they are the sal 
as in the previous figure. The path co 
cient from MSAT to affective reaction 
dicates that high-ability males and pe 
ering males in both systems appreciate 
course. This is not too surprising, Since m 
perform at a higher level as a result 099 
their ability and their perseverance Wi 
1). The more significant aspect 9 
analysis is the negative relationship De 
TAQ and affective reaction for traditio? 


INDIVIDUAL DIFFERENCES MODEL 


instructed females. Previous data have 
shown that these females tend to expend a 
great amount of time on the course but re- 
ceive a low grade. This frustrating situation 
probably colors their perception of the 
course. 

Reactions to the instructor. Not only 
does the affective domain include reactions 
to the course format but it also includes the 
student’s perception of the instructor. 
Consequently, 10 questions about the stu- 
dents’ opinion of the instructor were in- 
cluded on the attitudinal survey. A princi- 
pal-components factor analysis of the 
questions was performed. The solution in- 
dicated a general factor that accounted for 
48% of the variance among the items. The 
items were then used as correlated depen- 
dent measures in an Instructional System X 
Student Gender MANOVA. The results 
indicated a strong effect of instructional 
system, F(10, 386) = 4.11, p < -0001, and an 
effect of student gender, F(10, 385) = 2.09, 
p <.02, with the instructor being evaluated 
more positively in the self-paced system and. 
more positively by females. Apparently 
Students in the self-paced system perceive 
the instructor in a more constructive role. 


General Discussion 


The study reported here is unique in its 
evaluation design because individual student 
ifferences in educational background, 
ability, test anxiety, and perseverance have 
een employed as predictors of individual 
Student grades and attitudes within in- 
Structional systems to determine if the same 
predictors apply in the same manner to both 
"structional systems. There is thus in- 
teased understanding of the interaction of 
student characteristics with the properties 
of different instructional systems in deter- 
mining student learning strategies, perfor- 
mance, and attitudes, and the problems of 
Nect comparisons of average student per- 
mance and attitudes are eliminated. 
his type of evaluation design is predictive 
Well as descriptive and, because of its more 
Mividua] nature, allows suggestions to be 
ade for educational counseling that are not 
sible from knowledge of responses as- 
bed to the HAS, Before these suggestions 
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are presented, a summary of the relevant 
findings is necessary. 


Summary of Results 


In this study, several levels of analysis, 
culminating in a path analysis, were used to 
predict the level of any student’s perfor- 
mance and attitudes given that his or her 
educational background, ability, anxiety 
level, and perseverance are known. In gen- 
eral, perseverance was positively related to 
performance in the mastery learning sys- 
tems, whereas in the traditional system this 
relation was found only for males. The fe- 
male in the traditional system did not receive 
a grade commensurate with the time she 
spent in the learning process. Ability, di- 
rectly and positively, influenced both per- 
formance and attitudes toward the instruc- 
tional system independently of the other 
variables in the evaluation model. However, 
the indirect effects of ability on performance 
and attitudes were determined by the other 
variables in the evaluation model. In the 
self-paced system, for instance, ability was 
not significantly related to perseverance, 
whereas in the traditional system, these two 
variables were significantly negatively re- 
lated. Thus, in the traditional large lecture 
section with its fixed schedule of testing, the 
high-ability student was able to expend less 
time in the learning process than, and obtain 
a greater reward than, the low-ability stu- 
dent, an apparent inequity in the traditional 
instructional system. In the self-paced 
system, ability is less important, and student 
study time becomes a significant predictor 
of performance. Therefore, grades assume 
different meanings in the two instructional 
systems. In the self-paced system, grades 
reflect study time (and presumably learn- 
ing), whereas in the traditional instructional 
system, they are reflecting previously dem- 
onstrated ability. The choice of which in- 
structional system to use appears to be a 
choice of whether grades should represent 
achievement or ability. ; 

Test anxiety was found to be an important 
variable in the evaluation of the instructional 
systems studied here. In general, female 
students show higher test anxiety than male 
students, and this anxiety is increasingly 
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negatively related to performance in school 
as the student pursues higher degrees of ac- 
ademic excellence. In fact, the academic 
superiority of females over males found in 
grade school is reversed in the competitive 
college environment. Recent experimental 
studies have been concerned with the per- 
formance of males and females in achieve- 
ment-oriented settings involving competi- 
tion against others rather than against an 
abstract standard (Peplau, 1976). The re- 
sults suggest that females tend to avoid 
competition with others, especially compe- 
tition with males in a male-dominated field 
like the biological sciences (Walberg, 1969). 
When forced to compete against males in a 
traditionally male subject, highly anxious 
females perform less well than their mea- 
sured capabilities would predict (Maccoby 
& Jacklin, 1974; Mednick, Tangri, & Hoff- 
man, 1975). 

In this investigation, test anxiety was 
negatively related to female performance 
and modified the amount of time females 
spent in the learning process. In the tradi- 
tional system, high test anxiety was associ- 
ated with reduced perseverance; in the 
self-paced system, it was associated with 
increased perseverance for females com- 
pared with males. The compensatory, 
noncompetitive nature of the self-paced 
system apparently led to a pedagogically 
desirable relationship between test anxiety 
and perseverance for female students. As 
noted above, the debilitating effects of anx- 
iety on academic achievement for females 
are well documented by experimental evi- 
dence linking increased anxiety to decreased 
performance at intellectual tasks (Cattell, 
1966; Spielberger, 1966). 

The differences for instructional systems 
and the indirect effects of test anxiety on 
performance through perseverance reported 
here have not been previously described. 
However, Martin (1970), and Martin and 
Meyers (1974) found that in extremely 
stressful situations, anxiety manifested 
during preparation for an exam had a de- 
bilitating effect on quality of study and 
subsequent performance. In situations 

judged as less stressful, anxiety during 
preparation was significantly positively re- 
lated to study effort outside of class. Mayo 
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(1970) has predicted that a self-paced; 
will benefit a highly test anxious sti 
and Allen, Giat, and Cherney (1974) | 
reported a steady significant reducti 
self-reported anxiety for females di 

semester course in psychology taught 
self-paced methods. In spite of this 
dence, the question of whether self-p; 
directly modifies performance or simp! 
duces test anxiety that leads to better 
formance cannot be answered at this 
Nevertheless, the present data imply 
instructional systems can be tailored | 1 
characteristics of the student without elab- 
orate academic changes. 


Counseling Students 


The empirical outlook taken here in 
cates that matching student and ins 
tional system characteristics can provide: 
optimal learning environment for each: 
dent. The question now is, “What stude 
go with what instructional systems?" 
viously a comprehensive answer cannot 
given at this time, but on the basis of th 
data low-ability, highly test anxious stud 
should be counseled to take courses ta! 
by a self-pacing method. This recomm 
dation applies to all students, but especit 
holds for the low-ability, highly test anxi 
female. Unfortunately, most student 
large universities are never adequa 
counseled about a course of study. 
advisors never see their advisees. Given 
state of affairs, it would be interesting) 
determine what would happen if stud 
were allowed to choose the system 0 
struction for themselves. Janisse (1973) 
provided some data concerning this 4 
tion. Janisse found that when g! 
choice, low-test-anxious students will ch 
the traditional system more frequently. 
high-test-anxious students. A psycholo 
familiar with learning theory would expió 
this by saying that high-test-anxious 8: 
dents are avoiding a situation that may. 
crease theirtestanxiety. This explanaU? 
is consistent with the present data. 
be that counseling students about ms" 
tional systems is not necessary once they hle 
familiar with the characteristics of availabl 

f 
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General Summary 


| instructional systems. In that case, adver- 
tising rather than counseling may suffice. 
} 

The correlational results reported here 
corroborate the experimental results dem- 
onstrating a negative relationship between 
anxiety and performance for females and 
suggest that a self-paced system of instruc- 
tion allows the female student to realize her 
intellectual potential in spite of her high 
anxiety about competing against others. 
This beneficial effect of self-pacing is prob- 
ably due to reduced test anxiety for the fe- 
male student because of the opportunity to 
retake unit examinations. The consistency 
of these correlational results with past ex- 
perimental findings increases the confidence 
that can be placed in the results of controlled 
laboratory studies of the relations among 
instructional systems, gender, test anxiety, 
ability, perseverance, and performance. 
This is significant, since an ideal goal of ed- 
ucational research is to validate laboratory 
findings in correlational field studies. 
Maximum validity of our findings is ob- 
tained only when researchers move back and 
forth between the control of the laboratory 
and the complexity of the classroom. Nat- 
urally, we assume that our evaluation model 
and results are not specific to freshman 
biology or our particular university and that 
they can be generalized to any content area 

| or university faced with the problem of in- 
struction and evaluation of large college 
classes. We hope the present investigation 
has made a positive contribution toward 
teaching the goals of instructing individual 
Students and evaluating instructional 
methods in large college classes. 
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Relationships Between Associative and Content 
Structure of Physics Concepts 


Mary Patricia Thro 
Maryville College 


This study investigated the development of the associative structure of phys- 
ics concepts as a result of the content presented to the learner and the relation- 
ship between associative structure and academic achievement. Thirty 
subjects were divided into a treatment group (N = 19), who received physics 
instruction, and a control group (N = 11), who did not. Experimental 
subjects’ associative structures indicated significantly greater correlation with 
the instructor’s associative structure after exposure to course content than 
previously; experimental subjects also were more similar to the instructor than 
control subjects. Comparison of regression models showed that associative 
structure itself contributes significantly to the prediction of achievement. 
Results suggest that assessment of associative structure may allow insight into 


a learner’s progress which complements achievement testing. 


An instructor’s daily challenge includes 
guiding students toward the realization of 
their intellectual potential. The frequency 
with which a teacher’s attention is focused 
on this task makes it a vital issue and the 
Subject of much research. The question is 
not only one of content to be mastered, but 
also of correspondence of associative struc- 
ture of students with the established content 
structure of a course. Assuming that an 
Instructor knows the ideal content to be 
learned by a student, how is the student’s 
Progress in acquiring that material effi- 
ciently determined? Achievement tests are 
not totally satisfactory, for if a student scores 
Poorly on such a test, does the teacher con- 
clude that the associative structure is totally 
undifferentiated vis-a-vis the given content? 

here does the instructor begin attempting 
to remedy the deficiency and thus continue 
guiding the student toward the goal? 

The present study generates perspectives 
for a response to the question of effective 
Mstructional communication by investi- 


This article is based on a doctoral dissertation com- 
Peted at the Graduate Institute of Education, Wash- 
ngton University, St. Louis, Missouri. The author 
gatefully acknowledges the assistance with statistical 

"alyses provided by David Weldon. 
Th Quests for reprints should be sent to Mary Patricia 

"ro, Maryville College, 13550 Conway Road, St. Louis, 

Issouri 63141. 


gating the development of students' cogni- 
tive representation of physics concepts 
considered as a result of the content pre- 
sented to the learner. Measuring students’ 
associative structure at specified intervals 
and relating these data to an established 
content structure enables one to trace the 
development of concepts in students’ asso- 
ciative structure. The relationship between 
achievement and associative structure was 
also investigated, because an established 
associative structure seemed prerequisite to 
solving problems. By precisely describing 
the interdependence of content and asso- 
ciative structure, the study provides bases 
for future efforts to determine how teachers 
can communicate knowledge to learners 
most effectively. 

Associative structure refers to the pattern 
of relationships among concepts established 
in long-term memory. Word associations 
are assumed to derive in whole or in part 
from cognitive structures in semantic 
memory. In such research, frequency of 
responses played a prominent role in speci- 
fying cognitive structure (Deese, 1959, 1962; 
Johnson, 1964, 1965, 1967, 1969; Noble, 1963; 
Rothkopf & Thurner, 1970). However, 
Deese introduced a definition of associative 
meaning which claimed that two stimuli 
have the same meaning when they elicit re- 
sponses in common. This definition en- 
abled Johnson to use Garskof and Houston’s 
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(1963) relatedness coefficient to establish 
relationships between stimuli eliciting lists 
of words. This relatedness coefficient is a 
function both of the number of common 
words and of the order of appearance of the 
common words in word-association lists. 
While this index measures relationships 
between stimuli, it does not measure mean- 
ing directly. 

The validity of the word-association 
technique as a measure of cognitive structure 
was investigated by Shavelson (1974). In his 
study of physics concepts, the median rela- 
tedness coefficient on a pretest was zero for 
both treatment and control groups. On the 
posttest, the concepts were significantly 
more related in the treatment group com- 
pared with the control group. The treat- 
ment subjects’ word associations changed 
qualitatively and evidenced constraint by 

concepts in Newtonian mechanics. Sha- 
velson concluded that students in the 
treatment group were learning about the 
structure of the key concepts as defined in 
physics. Further evidence for the cognitive 
structure interpretation of word-association 
data is provided by Deese (1962), Shavelson 
(1972; Shavelson & Stanton, 1975), and Cox, 
Johnson, and Curran (1970). 
A Content structure consists of the definite 
interrelationships among concepts which are 
contained either in written or oral instruc- 
tional material. In a content domain such 
as physics, frequency of occurrence has been 
used to quantify the structure (Johnson, 
1967, 1969; Rothkopf & Thurner, 1970). In 
the present study, a frequency tally of con- 
cepts in the energy chapter of a physics text 
(Cromer, 1974) was used as the basis of se- 
lecting the key concepts of the study. Since 
the text is actually a representation of the 
cognitive structure of its author (Shavelson 
& Stanton, 1975, p. 79), and because the 
teacher lectured from the text, it seemed 
reasonable to use the cognitive structure of 
the teacher, obtained from word-association 
tests, to serve as the model of content 
structure. Thus, in this study, content and 
cognitive structure were both specified by 
means of word-association analyses. How- 
ever, because there is surplus meaning at- 
tached to cognitive structure, associative 
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structure is a more appropriate term in 
present study. 

The question of relating course content 
the associative structure of the subjects was. 
made explicit by the following two hypoth- 
eses: (a) The development of a physics. 
concept in semantic memory varies directly _ 
with the course content to which the learner 
is exposed; (b) a learner’s performance on 
achievement tests demanding application o 
physics content is a function of how closel 
the student’s associative structure matches 
that of the instructor. 


; 
| 

| 
Method 


Subjects 


Thirty subjects were divided into a treatment group. 
(N = 19) and a control group (N = 11). The treatment 
group was composed of 15 female and 4 male studen 
enrolled in a general physics course at a private colle 
The 11 subjects in the control group were students n 
enrolled in a physics course who had little experi 
in the sciences and who were willing to participate in the 
study. k 


Material 


Cromer's (1974) text, Physics for the Life Science 
was required for the course in which the experiment 
subjects were enrolled, and it provided the basic coni 
for the study. Lectures based on this text were pre 
sented to the experimental group 4 hours each veel 
throughout one semester of an academic year. a 

Physics concepts. Word-association data produced 
by the instructor of the course served as a norm [0 
content structure. Seventeen stimulus words were 
identified for word-association tests from the energi 
chapter of Cromer's (1974) text. That chapter W 
chosen as the focal point from which to select the 
concepts of the study because of the centrality ol 
concept energy to the development of all physics 
"These words, which spanned the range of frequency 
occurrence of physics concepts in the chapter, 
representative of a hypothesized development ol 
concepts in associative structure for this study, in WA 
six clusters formed the following order: kinemal 
dynamics, energy, machines, change of phase, and 
The seventeen concepts were distance, time, § 
acceleration, force, mass, weight, gravity, friction, @ 
ergy, kinetic, potential, mechanical, work, heat, etec 
tricity, and atom. The order of these words was VATI 
randomly for each administration of the word-assoct- 
ation test. y 

In order to quantify the word-association data, 
of five tests was converted into a matrix of rela! 
coefficients according to the procedures OU! 
Garskof and Houston (1963). The relatedness 
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cient (RC) was defined as follows: 
RC= SEA eee i: 
(A-B) - [n? = [(n — 19] E 
where A and B represent the vectors of the rank order 
of words under A that are shared in common with B and 
the rank order of words in B that are shared with A; 
A-B represents the vector of the rank order of words 
in A multiplied by the vector of the rank order of words 
in B; n represents all of the words in the longer list; and 
p = 1, in this study. A matrix representation of the 
instructor's associative structure provided the model 
for the content structure of the study. The possibility 
of establishing models for content structure rests on the 
assumption that a subject matter structure as repre- 
sented in the minds of experts such as authors of texts 
and teachers is fairly congruent. The subjects took the 
same word-association tests as the instructor. The 
matrix generated from each word-association test (five 
per subject) served as the data for the analyses de- 
scribed in the results section. 

Achievement tests. Achievement was measured by 
performance on tests that were constructed for the 
study. Two forms of a multiple choice general 
achievement test for the concept of energy, with 26 
items in each form, were compiled for pre- and post- 
testing. The hypothesized developmental pattern of 
the energy concept was used to develop achievement 
tests. The achievement tests were composed, first, by 
identifying basic equations for each of the topics in the 
development of the concept energy. Two multiple 
choice questions then were formulated for each of the 
identified equations. One question was assigned to 
Form A, and one was assigned to Form E. This insured 
comparable content for the two tests. For example, in 
Form A, Question 12 was, “A stone dropped from a 
bridge strikes the water 2.5 seconds later. How high is 
the bridge?”, while in Form E, Question 12 read, “A ball 
is dropped from a window 64 ft. above the ground. How 
long does it take to reach the ground?” Both of these 
E presume familiarity with the equation d = 

t?, 

| The apparent difficulty of questions was also varied 
for each of the topics. Questions involving definitions, 
simple applications, and problem solving ability were 
used. In arranging the questions on the test itself, the 
simpler questions from all the topics were positioned 
at the beginning of the test, while the more difficult ones 
Were at the end. Half of the subjects in each group were 
] Pretested with Form A and the other half with Form E. 
ch student received the alternate form for the post- 
test, Reliability for Test A was .68, while that for Test 
was .75, These modest reliability scores reduce the 

| Power of the achievement tests somewhat. 


Procedure 


The data were collected. throughout the first semester 
ofan academic year. During the first class period, both 
the first word-association test and a general achieve- 
Ment pretest, Form A or E, were administered to the 
®xperimental group. The instructions for the word- 

ation test asked the subjects to restrict their re- 
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sponses to physics words and indicated that 1 minute 
would be allowed for each stimulus word. The control 
group, not being in the class, took the tests within sev- 
eral days of the experimental group’s schedule, 
Cromer's (1974) text and the lectures presented the key 
concepts in a detailed fashion according to the hy- 
pothesized developmental pattern of the concept en- 
ergy. Thus, exposure to the kinematics concepts of 
distance, time, speed, and acceleration preceded in- 
struction on the dynamics or force concepts of mass, 
weight, gravity, and friction. The energy concepts of 
energy, potential, kinetic, mechanical, and work fol- 
lowed the kinematics and force sections, and were de- 
fined in terms of the preceding concepts, Machines, 
change of phase, and heat concepts were presented as 
applications of energy concepts, and therefore were 
presented subsequent to the energy lectures, 

As the sections dealing with the hypothesized stages 
in the pattern of the development of the energy concept 
were completed, the word-association tests to measure 
cognitive structure were administered. "This sequence 
of instruction periods followed by testing occurred three 
times during the semester for the experimental group. 
The word-association tests were administered to the 
control group within several days of the experimental 
group’s schedule. At the end of the semester, the fifth 
word-association test (Form E) and the general 
achievement test (Form A or E) were administered to 
all subjects, 

To provide a content structure model, the instructor 
also took each form of the five word-association tests 
within several days of administering the tests to the 
subjects. In order to provide a complete model with 
which to monitor the development of associative 
structure of the subjects, the instructor did not attempt 
to restrict responses on any of the five tests to the 
subject matter currently being covered in the classes, 
‘Thus, any one or a combination of all five word-associ- 
ation tests might provide the model for the subjects’ 
data. Euclidean distance scores! between the five 
matrices formed from these word-association tests 
ranged from .13 to .17, thus verifying the stability of the 
instructor's responses. 


Results and Discussion 
Establishment of the Content Model 


*Measures of content structure were ini- 
tially provided by the five 17 X 17 matrices 
of relatedness coefficients constructed from 
the instructor’s five word-association tests. 
These matrices were analyzed to determine 
the single most satisfactory model obtain- 


1 The Euclidean distance is obtained by squaring each 
difference between corresponding elements of two 
matrices, summing the squares, taking the square root 
of this sum, and dividing by the number of off-diagonal 
elements in each matrix (Geeslin, Note 1). 
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able from them. They were submitted toa 
multidimensional scaling procedure? for 
structural analysis, since this technique 
generates a spatial representation of the re- 
lationships among the concepts. The stress 
or standard error of estimate values, ob- 
tained for each matrix for 2-, 3-, 4-, and 5- 
dimensional scaling solutions, are reported 
in Table 1. The 5-dimensional solutions 
were used in this study in order to allow the 
model to account for the maximum amount 
of variance relative to the number of con- 
cepts scaled. Additional support for this 
decision comes from the fact that the num- 
ber of possible groupings of the 17 stimulus 
concepts from the word-association tests 
exceeds four. Since the 5-dimensional 
scaling solution produced by the instructor's 
final word-association test had the lowest 
stress value (.0269), relatedness coefficients 
of this solution became the model for sub- 
sequent analyses in this study. 

A standard cluster analysis? was done to 
facilitate interpretation of the five dimen- 
sions of the established model and to sub- 
stantiate the hypothesized grouping of the 
concepts. This analysis of the model data 
grouped the 17 stimulus words into three 
clusters with four unclustered concepts re- 
maining. The resulting clusters were (a) 
distance, time, speed, and acceleration, 
which is named the kinematics cluster; (b) 
force, mass, weight, and gravity, which is 
labeled the force cluster; and (c) energy, ki- 
netic, potential, mechanical, and work, 
which is the energy cluster. Friction, heat, 
electricity, and atom were the unclustered 
concepts. These clusters are consonant with 
the teaching pattern used to develop the 


"Table 1 
Stress Values? for the 2-5-Dimensional 


Scaling Solutions of the Five Model Word- 
Association Tests 


No. of Test 
^ p ——————R o 
dimensions A B c D E 


2 :1427 .1579 .1700 .1354 -1713 
3 .0884  .0805  .1012 0802 .0892 
4 .0551  .0512  .0665  .0453 -0453 
5 .0370 —.0330 _.0397 0278 .0269 
^ Stress is analogous to the standard error of estimate in bi- 


variate regression. 
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DIMENSION ONE 


2 
Figure 1. Dimensions one and two of the multi 
mensional scaling solution of model relatedness 
ficients with clusters indicated. 


energy theme in which kinematics was 
sented first, force was introduced next, tl 
energy. The unclustered items are speci 
applications of energy which would clust 
more appropriately with words extending 
the concept itself than with the general e 


grouped together. 

Dimensions 1 and 2 of the 5-dimensional 
structural analysis of the teacher's relat 
ness coefficients model are displayed 
Figure 1, with the clusters indicated on the 
graph. By combining these two analysesil 
this way, one might label the dimension 
cluster separators. Dimension 1 separate 
force-kinematics from energy-heat; Di 
mension 2 separates force from kinematic 
with friction and speed serving as the an 
chors, respectively. Dimensions 3, 4, and 
basically pick up the concepts not located iil} 
the specified clusters. 


program developed at Bell Telephone Laborato 
Kruskal, Young, and Seery (1973) and Torgerson, 
scaling technique is at the regression level and 
with the problem of representing n objects geome 
cally by n points, so that the interpoint Euclidean d 
tances correspond in some sense to experimental 
larities between objects. 

3 The OSIRIS computer program (Center for 
Studies, 1973) for cluster analysis was used. The) 
relatedness coefficients matrix was the datum us 
the program, whose parameters include STAR 
ENDMIN, and STAYMIN. STAYMIN is the 
mum value for beginning a cluster and was set at 
An item will be allowed to enter (or stay) in a Ci 
only if its correlation with the previous item P 
cluster is above ENDMIN (or STAYMIN). 
and STAYMIN were each set at .5099. 
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A B G D E 
TEST SESSIONS 
Figure 2. Mean z scores for each test day indicating 
goodness-of-fit of experimental and control groups with 
the model. 


Approximation of Subjects’ Cognitive 
Structures to the Content Model 


Before testing the hypotheses, it was 
necessary to specify the measure for good- 
ness-of-fit of the subjects’ relatedness coef- 
ficient matrices to the model just estab- 
lished. To measure goodness-of-fit, the 
Pearson correlation of the relatedness coef- 
ficient matrix of each subject for each 
word-association test with the 5-dimensional 
scaling solution of the content model was 
obtained. These correlations were con- 
verted to z scores, using Fisher’s r-to-z 
transformation. Analyses to test the hy- 
potheses were based on these z scores. 


Development of Associative Structure 


The development over time of a physics 
Concept in the cognitive structure of a 
learner was investigated by examining the z 
Scores that specified the goodness-of-fit of 
the associative structure of the experimental 
and control subjects with the model of con- 
tent structure derived from the teacher's 
Word-association data. Mean z scores for 

€ experimental and control groups are 
$taphed in Figure 2. In general, the corre- 
lation of the experimental subjects’ cognitive 
Structure with the model structure increased 
Somewhat with time. 

A 2 X 5 (Experimental/Control X Test 
Periods) analysis of variance with repeated 
Measures on the last factor was performed. 

de interaction of groups with sessions was 
‘nificant, F(4, 112) = 3.19, p<.0l. The 
*perimental/control group main effect was 
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significant, F(1, 28) = 52.59, p <.0001, as 
was the sessions main effect, F(4, 112) = 
7.40, p < .001. A simple effects test was 
applied to the interaction (Test Period x 
Experimental/Control) to determine its 
source. Both the simple effects of trials 
within groups and of groups within trials 
were assessed. The experimental group 
showed a significant change from Trial A to 
Trial B, F(1, 112) = 18.06, p < .001, and from 
the average of Trials A through D to Trial E, 
F(1, 112) = 13.59, p « .001. The control 
group showed no significant change over 
sessions. In the simple effects test of groups 
within trials, which is reported in Table 2, 
there was a significant difference between 
the experimental and control groups at all 
five testings. The fact that the groups dif- 
fered at Test A indicates that the groups 
were not comparable initially. To modify 
the analysis to account for this condition, an 
analysis of covariance was carried out with 
Session A serving as the covariate. An initial 
test on homogeneity of regression planes in 
the test sessions indicated no differences 
either within trials or between groups. The 
analysis of covariance confirmed the inter- 
pretation provided by the simple effects 
tests; that is, statistically adjusting for dif- 
ferences in the initial test scores did not 
eliminate subsequent differences between 
the experimental and control groups, F(1, 
27) = 57.33, p < .0001. Hence the statistical 
analyses are all consistent and suggest that 
the development over time of a physics 
concept in the associative structure of a 
learner varies directly with the content to 
which the learner was exposed through in- 
struction. 


Table 2 È 
Analysis of Variance Summary of Simple 


Effects Test of Groups Within Sessions 


Source df F 

Experimental vs. control 
Lies A 1, 28 4.20* 
Session B 1,28 35.03*** 
Session C 1,28 15.01** 
Session D 1,28 20.30** 
Session E 1,28 25.07*** 
* p « 05. 

** p « 001. 
*** p < (001. 


976 


Experimental subjects who were exposed 
to the content approached the model at a 
significantly greater rate than the control 
subjects. Session B is where this effect is 
focused. This indicates that cognitive 
structure was most radically altered in the 
first weeks of exposure to the content of this 
study, a result which provides important 
information for the instructional process of 
this content. Previous studies investigating 
the recall of verbal material provide some 
insight for interpreting this result (Hudgins, 
1977, pp. 23-63). Variables focused on in 
such studies include the position of ideas in 
the logical structure of the passage and rep- 
etition, or the number of exposures of the 
subject to the content. "These variables are 
relevant to the present study. 

With regard to the positioning of ideas in 
the logical structure of the verbal materials, 
the basic physics concepts, from which oth- 
ers are derived, must be defined and differ- 
entiated at the beginning of a physics course. 
The general physics course of the present 

study was structured such that the kine- 
matics concepts were presented first. From 
them, the force concepts were derived, and 
the energy concepts depended on both ki- 
nematics and force concepts for their for- 
mulation, "The Session B testing occurred 
immediately after the kinematics and force 
concepts had been presented. The initial 
establishment of basic concepts in cognitive 
Structure would alter it more than would the 
learning of subsequent concepts derived 
from the fundamental ones. 
As for repetition, instruction periods 
preceding each of the other test sessions 
provided students with additional exposure 
to the initially presented concepts. But 
studies have shown that. learning does not 
increase by equal increments with greater 
numbers of repetitions of the same material 
(Meyer & McConkie, 1973). This would 
account for the lower slope in the curve 
joining the mean z-score correlation with the 
model subsequent to Session B. "Thus, both 
the position of the concepts in the logical 
structure of the physics material and the 
decrease in incremental change of cognitive 
Structure with repeated exposure to a given 
content material provide information 
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showing that the greatest change in cognitive 
structure occurred at Session B. These 
findings converge toward emphasizing the 
critical nature of instruction during the first 
weeks of a course. 


Cognitive Structure and Achievement 


The relationship between the ability of 
subjects to solve problems and the estab- 3 
lishment of related concepts in their cogni-$ 
tive structures was investigated. After an 
analysis of variance confirmed that exposure" 
to instruction resulted in a significant in- 
crease in achievement scores of the experi-- 
mental group over the control group, F(1, 26) 
= 16.30, p < .001, stepwise regression models; 
were used to establish the relationship be- 
tween achievement and cognitive structure; 
The first model predicted Achievement E to & 
be a function of sex, Cognitive Structure B, 
and Achievement A, in the given order. In 
this analysis, both Achievement A, F(1, 27) 
7 11.77, p € .002, and Cognitive Structure 
E, F(1, 26) = 11.20, p < .0025, were signifi” 
cant predictors of achievement (R? = .51). 

A second stepwise model was also tested, 
which changed the order of the independent 
variables, predicting Achievement E to bea 
function of sex, Cognitive Structure E, and { 
Achievement A, in this order. In this anal: 
ysis only Cognitive Structure E is a signifi: 
cant predictor of Achievement E, F(1, 27) = 
21.59, p < .0001. 

Sex's contribution to the prediction 0l 
Achievement E was not significant. € 
first stepwise regression model provides 8 
chronological approach and indicates that 
both Achievement A and Cognitive Sti 
ture E contribute significantly to Achieve 
ment E. A subject’s success on the fi 
achievement test is influenced both by 
person’s initial ability to solve problems 
Session A and by the person's cognitive J 
structure at Session E. The second mode) 
shows Cognitive Structure E as the most 
efficient predictor of Achievement E. Thig 
indicates that most of Achievement AS 
contribution to Achievement E can be sub- 
sumed by Cognitive Structure E. Thus, the 
ability of subjects to solve problems was Seen 
to vary directly with the establishment of 
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lated concepts in their cognitive struc- 
tures. 


General Discussion 


The value of assessment of associative 
structure is its ability to provide additional 
insight into the state of a learner’s progress 
than that afforded by achievement testing 
alone. Investigation of associative structure 
2 this study has shown that it changes most 
radically in the first weeks of instruction. 
Tracing differences in the development of 
concepts in individuals' associative struc- 
tures can assist an instructor in identifying 
critical points in the instructional process of 
a given content, as well as in detecting im- 
properly related concepts in a student's as- 
sociative structure that are in need of at- 
Mention. "Those concepts that are not well 
differentiated in line with the model need 
further exploration by the student. These 
data may provide the instructor with a 
means of increasing students' comprehen- 
tion, since achievement is related to the es- 
tablishment of relationships among the 
clustered items in associative structure. 
Several insights that surfaced during this 
study merit further research. Since problem 
bolving achievement is significantly related 
to cognitive structure, achievement might be 
heightened by close observation of students’ 
cognitive structures before the administering 
of tests. Additional exposure to content 
material might be provided for those whose 
lusters were not well differentiated. One 
could then study dependent variables such 
48 change in achievement, comprehension, 
and interest to ascertain the merits of such 
^ pr Picato of the techniques of this 
udy, 
_ Since a leveling effect in change of cogni- 
live structure seemed to occur after Session 
» Which may be due to kinematics and force 
eing the subsuming concepts in physics 
Instead of energy, it would be valuable to 
teorder the presentation of the physics 
‘oncepts. Using a text such as Romer's 
(1916) Energy, An Introduction to Physics, 
Which places energy first in the instructional 
‘quence with kinematics and force as de- 
Wed concepts, would help clarify the reason 
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for the largest change in cognitive structure 
being focused in the early sessions. Does 
one concept intrinsically subsume another, 
or is it just a matter of which one is presented 
first? 

This study used relationship of concepts 
in associative structure as the predictor of 
achievement. The exact meaning of the 
concepts is another aspect of the question 
that might serve as a predictor. Shift of 
meaning might be measured with time as the 
stimulus for the analysis. Such astudy may 
be more precise and may account for more of 
the variance in achievement than the present 
study. 

By investigating the relationship of asso- 
ciative structure to content structure, this 
study has attempted to lay foundations for 
future decision making in the instructional 
process area, The value of the associative 
structure variable seems to lie in the addi- 
tional insight it provides into the state of a 
learner's progress. This insight may furnish 
teachers with new tools for their task of 
guiding students to the full realization of 
their intellectual potential. 


Reference Note 


1. Geeslin, W. E. Comparison of content structure and 
cognitive structure in the learning of probability. 
Paper presented at the meeting of the American 
Educational Research Association, Chicago, April 
1974. 
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Nonverbal Communication o 


Robert S. Feldman 
University of Massachusetts, Amherst 


their nonverbal behavior showed that 
discriminated between white and black 
ly more than low-prejudiced teachers. 


a student of their own race than to a st 
same-race judges could distinguish the 
subjects. 


There is a growing body of literature 
toncerning the role of nonverbal behavior in 
cial interaction. It now seems that inde- 
Dendent messages may be carried by the 
onverbal channel of communication, sep- 
Grate and distinct from verbal discourse. In 
, nonverbal behaviors sometimes may be 
hore indicative of an individual's true feel- 
hgs than verbal output (Ekman & Friesen, 
1974; Mehrabian, 1971). 

y In this study, nonverbal communication 
is examined as it related to inter- and intra- 
acial teacher-student dyadic interaction. 
The major objective of the research is to 
emonstrate the occurrence of differential 
honverbal behavior relating to the race of a 
eacher's student, even in the case in which 
érbal behavior is held constant. 

It seems clear that facial nonverbal be- 
avior is related quite directly to the affect 
being experienced by an individual and that 
luch affect can be communicated relatively 


rst author from Sigma Xi, The Scientific Research 
lety of North America, and the Virginia Common- 
lalth University Faculty Research Fund. Experiment 
d on a master's thesis submitted to the De- 
"'tment of Psychology, Virginia Commonwealth 
hiversity, by the second author. The authors are 
Stateful to Steven Hamby, Ronald Campana, and 
&therine E, Vorwerk for their aid. 
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*ldman, Department of Psychology, University of 
assachusetts, Amherst, Massachusetts 01003. 
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f Affect in Interracial Dyads 


Lawrence F. Donohoe 
Virginia Commonwealth University 


Two experiments investigated the relationship between nonverbal behavior 
and the racial composition of a teacher-student dyad. In Experiment 1, 36 
high- and low-prejudiced white subjects, acting as teachers, were led to praise 
successful white and black students (confederates). Analysis of samples of 


high-prejudiced teachers nonverbally 
students (favoring whites) significant- 
In Experiment 2, 40 white and black 


teachers taught successful white and black students (confederates). Results 
showed that both whites and blacks behaved nonverbally more positively to 


udent of the other race, although only 
differences in affect displayed by the 


accurately to even untrained observers (e.g., 
Ekman, 1965; Ekman, Friesen, & Ellsworth, 
1972; Hall, 1964). Moreover, it has been 
shown that nonverbal behavior not only is 
related to a person's affective state but that 
it also can reveal information unwittingly 
that verbally is being withheld from a dyadic 
interactant (Ekman & Friesen, 1969, 1974; - 
Feldman, 1976). For instance, Feldman 
(1976) found that naive observers could 
distinguish persons who liked a dyadic in- 
teractant from persons who disliked their 
partner, independent of the nature of the 
verbal behavior in the situation. This 
suggests that it may be possible that the 
immediate events within a dyadic interac- 
tion are less influential in determining an 
individual's nonverbal behavior than the 
attitude the individual originally holds re- 
garding the interactant, particularly in cases 
in which the prior attitude is strongly 
held. 

These ideas relate quite directly to inter- 
racial interaction if individuals have strongly 
held, salient attitudes regarding persons of 
different races. This possibility appears to 
be well documented by evidence of both an 
anecdotal and an empirical nature (Allport, 
1954; Simpson & Yinger, 1972; Katz, Note 1). 
Attitudes relating to race are acquired quite 
early in a person's life and are strongly held 
(Katz, Note 1). By the age of 3 or 4 years, 
children are able to differentiate between the 
races, and even at that age there seems to be 
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some differential affective feeling regarding 
blacks and whites. i 
On the basis of the previous evidence, it 
thus seems reasonable to conclude that first, 
nonverbal behavior reflects an individual’s 
attitude toward a person with whom he is 
interacting. This appears to be true even 
when the individual neither intends for such 
information to be communicated to the in- 
teractant nor is necessarily aware of such 
communication. Second, previous research 
has established that attitudes regarding race 
are particularly strong and affect-arousing. 
Bringing these two sets of findings together, 
it would seem reasonable that attitudes and 
feelings about race would be elicited during 
an interracial teaching interaction and would 
be revealed nonverbally, regardless of the 
verbal behavior of the interactants. Thus, 
even under conditions in which a same-race 
versus a different-race student is behaving 
similarly, leading the teacher to react in a 
verbally identical manner, it may be hy- 
pothesized that differential nonverbal be- 
havior occurs, reflecting the teacher’s degree 
of prejudice. 


Experiment 1 


In Experiment 1, a teaching paradigm was 
used to test the hypothesis that nonverbal 
behavior is related to an interactant's racial 
prejudice and the racial composition of an 
interacting dyad. White high- and low- 
prejudiced subjects, acting as teachers, were 
led to verbally praise successful white and 
black students. The spoken content of the 
teacher's praise was controlled through the 
use of confederates playing the role of stu- 
dents and by employing a standard teaching 
plan, leading to similar verbal output by all 
teachers. : It was expected that the nonver- 
bal behavior of the high-prejudiced teachers 
would vary according to the race of the stu- 
dent, even when verbal behavior was in- 
variant, but that the nonverbal behavior of 
low-prejudiced teachers would be affected 
significantly less as a function of the race of 
the student. Evaluation of the stimulus 
teachers' nonverbal behavior was made by 
untrained, naive white and black judges, who 
rated videotaped samples of behavior ac- 


ROBERT S. FELDMAN AND LAWRENCE F. DONOHOE 


cording to how pleased the stimulus teacher 
appeared. 


Method 


Subjects. Subjects, who acted as stimulus teache 
were 36 white undergraduate females enrolled in in 
troductory psychology classes. They received ext 
class credit for participation in a voluntary subje 
pool. 

The subjects were selected from a pool of about 
individuals who completed the Multifactor Racial 
titude Inventory (MRAI), an instrument designet 
identify whites' attitudes toward blacks (Wood: 

& Cook, 1967). The MRAI is a recently developed 
measuring racial prejudice and is well standardized 
Subjects falling into the upper and lower quartiles of th 
distribution of scores of those taking the inventory 
designated as high- and low-prejudiced subjects. 
were 18 high-prejudiced and 18 low-prejudiced subj 
randomly chosen from the two quartiles of participatio 
in the experiment. 

Procedure. To observe subjects’ nonverbal behav 
under standardized conditions, a situation was devise 
which kept spoken behavior constant across all cond 
tions. Subjects, who arrived at the experiment in p 
were told.that the purpose of the experiment was’ 
study the teaching process. They were told that one 
them would be randomly chosen to act as a teacher at 
to administer a test to the other. In reality, one of th 
subjects was a confederate, and a rigged drawing el 
sured that the subject was always made the teacher an 
the confederate was always the student. 

"The confederate was told to enter the next room 
wait for the subject. Subjects were then shown the teg 
they were to administer to the confederate. The 
called the “test of verbal-logical coordination,” 
been used in previous research (Feldman, 1976), 
actually had no correct answers. It consisted of 
analogy-type items, with four choices for each 
sponse. s 
Subjects were told that the test procedure, being 0 
an experimental nature, required that they follow a se 
procedure in administering feedback. When thes 
dent answered a test item correctly, she was to be to! 
“Right—that’s good,” and the subject was to Process 
to the next item. When the student responded incor 
rectly, the subject was told to correct the student and 
explain why the answer was wrong. It was emphasized 
to the subject that only the phrase, “Right—th 
good,” should follow each correct response. 

To ensure that the students performed predon 
nately well on the test, thus providing the opportun 
for subjects to be verbally reinforcing to their stu 
confederates were given a set pattern of answers 
plied incode. "The confederate was made to appe 
if she were answering 16 of the 19 test items correg 
and, to provide verisimilitude, 3 erroneously. eg 
Following the conclusion of the test, subjects 
thoroughly debriefed. Subjects were inform he 
they had been secretly videotaped and wm do 
option of destroying the tapes; no subject m 

80. dens 
Confederates’ behavior. The confederates Were 
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vhite and two black females, all university students. 
Sonfederates were assigned to subjects in a random 
3 shion with the following stipulations: The two white 
nfederates interacted with 18 stimulus teachers, 9 of 
hom were high-prejudiced, and 9 low-prejudiced. 'The 
o black confederates also interacted with 9 high- 
rejudiced and 9 low-prejudiced stimulus teachers. 
Thus, a given confederate was randomly assigned to 
interact with either 4 high-prejudiced and 5 low-prej- 
iced, or 5 high-prejudiced and 4 low-prejudiced 

imulus teachers. Confederates were unaware that a 

iven stimulus teacher was either high-prejudiced or 

-prejudiced. 

Confederates were trained to keep both their verbal 
ind nonverbal behavior constant with all subjects to 
ensure standardization across conditions. Verbal be- 
havior of the confederates was limited to the coded 
answers in their test booklets; confederates were told 
not to initiate further conversation. They were told to 
look at the test booklet at all times except when an- 
swering the questions, when they were to briefly glance 
at the subject. Confederates were told to maintain the 
same body position throughout each interaction and 
across subjects. Since the confederates were blind to 
perimental condition and experimental hypothesis, 

ifferential behavior was unlikely. 

Preparation of videotape samples for observers. 
ach stimulus teacher was covertly videotaped as she 
idministered the test to the confederate. The camera 
as placed so that it would record a frontal view of the 
imulus teacher's face and shoulders. 

A 20-second silent segment from each of the 36 in- 

ractions was edited from the original videotapes onto 

new tape in a random order. The edited segment of 
ach interaction was taken starting from the point 

where the confederate responded to the second test 
question on the 19-item test. Thus, each sample 
showed the stimulus teacher verbally praising the 
confederate and then asking the next question. The 
Boiederates race could not be seen on the taped sam- 
le. 
Judging. The new tape was shown to 20 white and 
20 black college-age female judges. Both whites and 
lacks were used to make judgments because there is 
me literature showing racial differences in perception 
of nonverbal behavior (Gitter, Black, & Mostofsky, 
1972), and same-sex (female) judges were used to rate 


teracted with the (unseen) confederate. Half of the 
Samples showed a high-prejudiced stimulus teacher, and 
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half showed a low-prejudiced stimulus teacher, and 
within each of these divisions half of the stimulus 
teachers interacted with a white student and half with 
a black student. This yielded nine scores for each of the 
four possible combinations. 

Data from the ratings were analyzed ina2 X 2X 2X 
9 mixed design analysis of variance. The between- 
subjects factor was the race of the judge (white or black), 
and the within-subjects factors were the race of the 
confederate (white or black), the degree of prejudice of 
the stimulus teacher (high or low), and stimulus teacher 
effects (the nine stimulus teachers in each cell). The 
stimulus teacher factor was nested within the Confed- 
erate Race X Subject Prejudice interaction. 


Results and Discussion 


Procedural check. An analysis of 
subjects’ verbal behavior showed that verbal 
responses were consistent across subjects. 
Their verbalizations consisted only of those 
phrases the experimenter told the subjects 
to employ. J 

Judges’ ratings. Results of the analysis 
of variance performed on the judges' ratings 
are shown in Table 1. "There was a signifi- 
cant main effect for race of student, F(1, 38) 
= 108.23, p < .0001, stimulus person preju- 
dice, F(1, 38) = 51.35, p < .0001, and stim- 
ulus teachers, F(32, 1216) = 27.75, p < .0001. 


Table 1 > : 
Analysis of Variance on Ratings of Subjects 
pe Ls Bee BLE heehee td aaa Palais hE 
Source df MS F 
Race of judge (A) 1 158  .59 
Subjects within groups 38 435 " 
Race of student (B) 1 73.35 108.23* 
AXB 1 03 06 
B X Subjects within 
groups 4 38 68 
i teacher prejudice 
Bi c e" 1 58.81 51.35** 
AxC 1 37 32 
C x Subjects within 
Ide 38 1.16 
BxC 1 1331 1445* 
AXBXC 1 03 04 
B X C X Subjects within 
groups () witht 38 96 
i teachers (D) within 
ete = 32 2247 27.75°* 
32 87. 107 


AXD (BX C) 
D (B X C) X Subjects 
within groups 1216 81 


* p € .0005. 
** p < 0001. 
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Figure 1. Ratings of how pleased stimulus teachers 
appear with their student. 


The main effect for race of judge was not 
significant. There was also a significant 
interaction between race of confederate and 
stimulus person prejudice, F(1, 38) = 14.45, 
p <.0005. No other interactions were sig- 
nificant. 

Inspection of the means involved in these 
effects showed that in general, stimulus 
teachers were judged to be more pleased 
when interacting with a white student (M = 
4.39, where 1 = very displeased and 6 = very 
pleased on a 6-point scale) than with a black 
student (M = 3.93). In addition, low-prej- 
udiced stimulus teachers were judged to be 
significantly more pleased with their stu- 
dents (M = 4.36) than were high-prejudiced 
stimulus teachers (M = 3.96). The signifi- 
cant main effect for stimulus teachers nested 
within the Student Race X Stimulus 
Teacher Prejudice interaction indicates that 

there was significant variation in the ratings 
assigned to the nine stimulus teachers within 
each of the experimental cells. Apparently, 
there were significant individual differences 
in the nonverbal behavior of the stimulus 
teachers within each condition. 

Of most interest to the present study was 
the significant interaction between race of 
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student and stimulus teacher prejudice, 
which is depicted in Figure 1. The interac: 
tion indicates that the difference in rating 
accorded low-prejudiced stimulus teacher, 
interacting with a white versus a black st 
dent was significantly smaller than the rat; 
ings accorded high-prejudiced stimulus 
teachers interacting with a white versu: 
black student. However, further analysis 
the interaction, using the Duncan multi 
ple-range test (Duncan, 1955), showed th 
both high- and low-prejudiced stimulu 
teachers appeared significantly more please 
with a white interactant than a black inter- 
actant (p < .01). Furthermore, Duncan 
tests showed that the ratings of the low- 
prejudiced stimulus teachers interacting 
with a black were significantly higher than 
the ratings of the high-prejudiced stimulus 
persons interacting with a black (p € .01); 
that is, low-prejudiced stimulus teacher, 
appeared to be significantly more please 
with their black students than high-prej 
diced stimulus teachers did with their blac 
students. : 

It thus seems that the hypothesis teste 
in Experiment 1 was confirmed. Untraine 
naive observers could discern that th 
high-prejudiced stimulus teachers appeare 
to be more pleased with white students than 
with black students, even though the objec- 
tive performance of the white and black 
students was identical. Of course, given tha! 
there were differences between the rating} 
of stimulus teachers nested within conditio 
it is possible that there was also differentia 
behavior between the white and black cor 
ferates’ nonverbal behavior, which could in 
turn have affected the stimulus teachers 
behavior. This seems unlikely, though; 
since the confederates were carefully traime 
to emit the same nonverbal cues to t 
stimulus teachers. 

High-prejudiced stimulus teachers we 
also rated as significantly less pleased wi 
black students than were low-prejudi 
stimulus teachers, as predicted. "These 
sults support the hypothesis that prejudi 
attitudes are related to an individual's no? 
verbal behavior. In sum, it appears that th 
high-prejudiced female stimulus teachers 
selected on the basis of their expression o 
negative attitudes toward blacks on a writtë 


ure, displayed nonverbal behavior 
ngruent with the negative attitudes. 

Tt is important to note that even the low- 
rejudiced stimulus teachers appeared more 
eased with the white than black students, 
though the difference in ratings was sig- 
icantly smaller than for the high-preju- 
stimulus teachers. Even the relatively 
rejudiced stimulus teachers acted less 
vely toward black students than white 
ludents. This finding suggests that there 
hay be a generalized tendency on the part of 
tes toward holding more positive atti- 
udes toward other whites than toward 
cks. Expanding upon this notion, it is 
ble that blacks, too, might generally 
old more positive attitudes and affect 
ward members of their own race, as op- 
osed to members of other races. Indeed, 
survey data suggest that both whites and 
lacks hold more positive feelings regarding 
ame-race persons (e.g., Derbyshire & Brody, 
Hraba & Grant, 1970; Simpson & 
linger, 1972). If this is the case, we would 
ixpect that both whites and blacks would 
have more positively nonverbally toward 
ame-race than cross-race individuals. 


Experiment 2 


"In this study, a teaching paradigm similar 
0 that used in Experiment 1 was employed 
0 test the hypothesis that same-race stim- 
lus teachers would appear more pleased 
lonverbally with same-race students than 
vith cross-race students. It was expected 
hat both whites and blacks would manifest 
ich differential nonverbal behavior. In 
der to make the situation more represen- 

ive of an actual teaching situation, 
Subjects in this experiment actually taught 
ashort lesson to a third-grade student, who 
as a confederate. Furthermore, subjects 
ü this experiment were not preselected on 
basis of their attitudes toward cross-race 
‘sons; instead, it was assumed that (as the 
ults of Experiment 1 indicated) the phe- 
homenon of differential nonverbal behavior 
toward same- and cross-race subjects would 
sufficiently general to appear in a ran- 
mly selected sample of the population. 
nce, Experiment 2 provides a strong test 

the hypothesis that white stimulus 
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teachers would appear more pleased with 
white than black students, and that black 
stimulus teachers would appear more 
pleased with black than white students. 


Method 


Subjects. Subjects, who acted as stimulus teachers 
to a student (confederate), were 40 undergraduate fe- 
males enrolled in introductory psychology classes. 
They received extra class credit for their voluntary 
participation in the experiment. Half of the subjects 
were white, and half were black. 

Procedure. A situation similar to that employed in 
Experiment 1 was used. Subjects were told that they 
would be acting as a teacher to a third-grade student. 
"They were given a brief standardized lesson on trape- 
zoid identification to teach their student, and then they 
were instructed to administer a 14-item test. The test 
presented both positive and negative instances of 
trapezoids, and the student’s task was to identify the 
figure as an example or nonexample of a trapezoid. 
Both the lesson and test had been pretested and used 
in previous research (Allen & Feldman, 1974), 

Subjects were told that the experimental nature of 
the lesson required that they use a standard procedure 
in administering feedback to their student. They were 
told that only the phrase, “Right—that’s good,” should 
be said following a correct response, while incorrect 
responses should be followed with an explanation of why 
the student was wrong. The subject was encouraged 
to use only the words, *Right-that's good,” following 
a correct response. 

Confederates played the role of student. Unbe- 
knownst to the subject, a set pattern of answers was 
supplied in code to the confederate. The confederate 
was made to appear as if he were answering 12 of the 14 
test items correctly. To increase realism, the confed- 
erate responded with errors twice. 

Subjects were debriefed following the conclusion of 
the lesson. No subject expressed awareness of the hy- 
potheses of the study or that the student was a confed- 
erate. Subjects were told that they had been video- 
taped and that the tape could be erased, but no subject 
chose this option. 

Confederates. Two white male and two black male 
third-grade confederates served as students. Since the 
race of the subject varied in addition to the race of the 
confederate, subjects were randomly assigned to teach 
one of the confederates with the restriction that half the 
white subjects teach a white confederate and half a 
black confederate. In addition, assignment of the 
subjects was arranged so that each of the two white 
confederates and each of the two black confederates was 
used with an equal number of subjects. 

The confederates were children from a local school 
system and were paid for their participation. They 
were carefully trained to ensure that both their verbal 
and nonverbal behavior remained constant across all 
subjects. The confederates were instructed to keep 
their eyes focused on the lesson materials and to remain 
in the same postural position for every subject in order 
to give the impression that they were intently studying 
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the materials. Hence, it is unlikely that the confeder- 
ates reacted differentially to particular subjects. 
Preparation of videotape samples for observers. 
Videotape recordings of the stimulus teachers were 
made (by a hidden camera) as they administered the 
test to the confederate, The camera was situated so 
that the stimulus teacher's face and upper body were 
recorded during the session. A standardized sample of 
each stimulus teacher’s nonverbal behavior was pre- 
pared to be shown to judges. A 20-second sample, 
without sound, of each of the 40 stimulus teachers’ 
nonverbal behavior was edited onto a new master vid- 
eotape in a random order. Each sample was taken from 
the identical portion of the 14-item test at a time when 
the stimulus teacher was initially praising the confed- 
erate and then asking the response to the next question. 
Verbal behavior was thus the same in each of the sam- 
ples. Since there were 40 subjects acting as stimulus 
teachers, and since equal numbers of white or black 
stimulus teachers taught a white or black confederate, 
this meant that there were 10 samples from each of the 
four types of subject-confederate pairings. 

Judging. Observers, who acted as judges of the 
stimulus teachers’ nonverbal behavior, were 12 white 
female undergraduates who were blind as to the hy- 
potheses of the study. Since the results of Experiment 
1 show no difference in the decoding as a function of 
race of observer, only white judges were used; and 
same-sex judges were used to make ratings of the female 
stimulus teachers, 

As an Experiment 1, the judges were told that the 
tape showed teachers giving a short test to a student, 
and that the teachers were telling the students that they 
were answering correctly (although this could not be 
heard, since the samples were silent). The judges were 
asked to determine how pleased the teachers were with 
their students using the 6-point scale employed in Ex- 
periment I, After each of the 40 20-second samples was 
shown to the observers, there was a pause to allow time 
to make the rating. 

Data analysis. Each observer saw all 40 samples, In 

20 of the samples a white stimulus teacher was shown; 
the white stimulus teacher was interacting either with 
a white or black (unseen) student. The remaining 
clippings showed a black stimulus teacher who had in. 
teracted either with a white or black (unseen) student. 
Two nal planned comparisons were used to test 
the hypothesis that for both white and black stimulus 
persons, same-race students would appear to receive 
more positive nonverbal behavior than cross-race stu- 
dents. One contrast compared ratings given the white 
stimulus persons who were reinforcing a white 
student with the ratings given white stimulus teachers 
verbally reinforcing a black student. The second 
planned comparison contrasted ratings of black stim- 
ulus teachers praising a white student with black 
stimulus teachers praising a black student, 


Results and Discussion 


Results of the judges’ ratings showed that 
the white stimulus teachers were rated as 
being significantly more pleased with the 
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student when they were praising a white 
student than a black student, F(1, 11) = 8.29, 
p <.025, as predicted. The mean rating fo; 
the white stimulus teachers interacting with 
a white student was 4.08, while the rating for 
white stimulus teachers)with a black con- 
federate was 3.88, where 1 = very displeased, 
and 6 = very pleased. This directly supports 
the findings from Experiment 1. 

In contrast, judgments of the black stim. 
ulus teachers showed no significant differ. 
entiation according to the race of the stu- 
dent. The planned comparison contrasting 
the ratings of black stimulus teachers 
praising a black student yielded a nonsigni- 
ficant result, F(1, 11) = 1.44, p > .20. There 
was little judged difference between blacks 
who were interacting with a white (M = 4.27) 
or a black (M = 4.18) student. 

In summary, it appears that the ratings of 
the white judges supported the hypothesis 
that there would be differential displays of 
affect according to the race of the stu: 
dent—but only when considering the white 
stimulus teachers. The white stimulus 
teachers were judged to be more pleased with 
the white students than with their black 
students. But the judges could discern no | 
difference between the black stimulus 
teachers interacting with a white student 
versus a black student. 

Still, before concluding that the black 
subjects showed no differential displays of 
affect, it seemed reasonable to reexamine the 
judging procedure. The ratings of the 
stimulus teachers had been made by a group 
of white judges. Although there was no ev- 
idence from Experiment 1 that there would 
be differences between rating made by white. 
and black judges, other research has found 
that there are racial differences in decoding 
(e.g., Gitter et al, 1972). Hence, it seemed 
appropriate to have a group of blacks act as 
judges of the nonverbal behavior of the 
stimulus persons in the master tape. 

A group of 20 black female undergradu-' 
ates were shown the identical videotape as 
that shown the white judges, and they were 
asked to make judgments using the same 
8-point scale used by the white judges. 
Their ratings were analyzed using the same 
planned comparisons used on the white 
judges’ ratings, : 


2 


Analysis of the black judges’ ratings re- 
aled that black stimulus teachers who were 
eracting with black students appeared 
ificantly more pleased with black stu- 
ents (M = 4.64) than with black stimulus 
lachers interacting with a white student (M 
4.38), F(1, 19) 24.58, p <.05. It appears 
lat indeed the black stimulus teachers did 
mmunicate differential affect according 
the race of their student, in support of the 
ypothesis. 
an interesting contrast to the ratings of 
ie white judges, analysis of the black judges’ 
tings showed that black judges did not 
stinguish ‘a difference between white 
imulus teachers interacting with a white 
ludent compared with white stimulus 
achers interacting with a black; the 
lanned comparison was not significant, F(1, 
= 1.59, p > .20. White stimulus teachers 
iteracting with a white student were given 
mean rating of 4.12, while white stimulus 
achers interacting with a black were given 
mean rating of 3.99. 

Overall, it appears that the hypothesis of 
lifferential nonverbal behavior according to 
he racial composition of a dyad was sup- 
jorted in Experiment 2, at least when look- 
hg at judgments made by raters of the same 
lace as the stimulus teachers in question. 
White judges rated white stimulus teachers 
s being more pleased nonverbally with 
vhite than black students; black judges 
ated black stimulus teachers as showing 
hore pleasure with black than white stu- 
lents. Although neither white nor black 
Judges in Experiment 2 could distinguish 
eliably nonverbal behavioral differences in 
timulus teachers of a race other than their 
bwn, it seems that there were differential 
lisplays of affect according to the race of the 
imulus teachers’ students. 

One alternative hypothesis that cannot be 
equivocally ruled out, however, is that the 
nfederates behaved differentially ac- 
rding to the race of their teacher. If this 
ere the case, the stimulus teachers’ non- 
thal responses could have been due simply 
the confederates’ nonverbal behavior and 
ot due to generalized prejudice. This ex- 
lanation has some cogency, particularly 
ven that stimulus teachers had the op- 
ortunity to interact with the confederate 
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while teaching the concept prior to admin- 
istering the test. This alternative hypoth- 
esis seems unlikely, though, in view of the 
steps that were taken to train and control the 
confederates’ behavior. Still, it cannot be 
rejected entirely. 


General Discussion 


There are a number of conclusions that 
can be drawn on the basis of the results of the 
two experiments. First, it appears that the 
nonverbal behavior of the female stimulus 
teachers used in the experiments was related 
to the attitudes they held toward the race of 
the person with whom they are interacting. 
Second, both the white and black stimulus 
teachers behaved more positively nonver- 
bally toward members of their own race than 
toward cross-race persons. Third, extrap- 
olating from these results, it would seem that 
the females tested in the experiment gener- 
ally held more positive attitudes toward 
members of their own race than other 
races. 

Our explanation for these findings is that 
the mere presence of an attitudinal 
object—in this case, a dyadic partner—is 
sufficient to evoke nonverbal behavior re- 
lating to the attitude held toward the part- 
ner. This nonverbal behavior may be rep- 
resentative of the individual’s generalized 
feeling toward the attitudinal object, rela- 
tively independent of the setting, or it may 
be a function of the individual's more spe- 
cific attitude toward a person within the 
particular situation (e.g., attitude toward a 
white versus black student). In either case, 
the results from the two experiments suggest 
that both the white and black stimulus 
teachers held attitudes favoring students of 
their own race, and their nonverbal behavior 
revealed these underlying attitudes. — — 

It should be stressed that the subjects in 
both experiments were responding relatively 
negatively on a nonverbal level toward dif- 
ferent-race partners while verbally saying 
quite positive things to them (“Right—that’s 
good”). That such nonverbal responses 
were discernable even in the short, 20-second 
samples the judges viewed suggests that 
there may have been a significant amount of 
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verbal-nonverbal channel incongruency 
occurring. If such incongruency were to 
occur frequently in real-life settings, it would 
almost certainly increase the aversiveness of 
interracial interaction, relative to same-race 
interaction, and maintain and support 
prejudicial and discriminatory behaviors. 
Why did both whites and blacks discern 
in the same way the nonverbal behavior of 
the white stimulus teachers in Experiment 
1 while there was a substantial difference 
between the judgments of the white and 
black observers in Experiment 2? Although 
the anomaly may be due to procedural dif- 
ferences between the two experiments (the 
use of college students versus third graders 
for confederates, different types of lessons, 
different test lengths), it seems most likely 
that the reason for the difference lies in the 
subject selection process. It will be recalled 
that subjects in Experiment 1 were in the top 
and bottom quartiles of a distribution of 
scores on a measure of prejudice, while 
subjects were not preselected in Experiment 
2. Hence, we would expect that differences 
in nonverbal behavior would be more pro- 
nounced in Experiment 1 than in Experi- 
menk 2 and, therefore, more easily decoda- 
e. 
Still, this reasoning does not explain the 
underlying locus of causality for why the 
same-race judges in the second experiment 
should be more sensitive than cross-race 
judges. The finding suggests that there are 
at least some subcultural differences be- 
tween whites and blacks in the encoding or 
decoding (or both) of nonverbal behavior. It 
seems plausible that there are differences in 
the encoding process; whites and blacks may 
display affect somewhat differently. It is 
just as likely, however, that the decoding 
process was the locus of cross-race judgment 
differences. The judges may have used 
slightly different judgment criteria when 
making inferences about how pleased same- 
as compared with cross-race individuals 
were. Equally possible is that both encoding 
and decoding processes varied simulta- 
neously. Of course, these subcultural dif- 
ferences appear to be relatively minor given 
that they did not occur in the instances of 
stronger affect displayed in Experiment 1. 
The present findings are congruent with 
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prior research in the relationship of race 
nonverbal behavior. For instance, W. 
Zanna, and Cooper (1974) found that wl 
subjects acting as interviewers behaved 
ferentially according to the race of the 
terviewee. (They did not use blad 
subjects.) In addition, Fugita, Wexley, an 
Hillery (1974) examined nonverbal behavig 
of white and black interviewees during aj 
actual job interview and found differenti 
nonverbal behavior as a function of the ra 
of the interviewer. However, these prit 
results are somewhat equivocal, as verb 
behavior was not entirely standardized. 
the present study, verbal behavior 


tions. r 
It should be noted that the results of tl 
studies provide a rather robust demonstr 
tion of the phenomenon of differential 
verbal behavior due to the racial composi 
of an interacting dyad. Because of 
procedure employed, the verbal behavior < 
the subjects was held quite constant across 
conditions, thus allowing differences i 
nonverbal behavior to be interpreted un 
equivocally. It is also important to note th 
subjects were providing verbal reinforcement 
to a successful student who they were 
teaching. The confirmation of the hypoth: 
esis under these circumstances provides 
strong evidence for the existence of diff 
ential nonverbal behavior in dyadic in 
actions. Finally, the use of a relatively lar 
number of subjects as stimulus perso! 
each condition (9 or 10) suggests that 
phenomenon is quite general and not related 
to idiosyncracies of specific encoders. __ 
On an applied level, results of the stud) 
seem to suggest some potentially cruci@ 
factors that may operate in educatio 
settings and which deserve further attenti 
If, for instance, students are aware of d 
ferential nonverbal behavior accorded t 
white and black students by teachers, it! 
likely that this will have a substantial i 
ence on the learning process. Likewise, 
possible that feelings of confusion may resul 
if, for example, a black student finds 
white teacher being verbally reinforcing b 
displaying relatively negative nonverb: 
cial expressions. Further research into the 
effects on the learning process of incon 


gruent verbal and nonverbal behavior would 
geem worthwhile. 


Reference Note 


1. Katz, P. A. Racial attitudes, perception and 
change. Paper presented at the 83rd Annual 
Meeting of the American Psychological Association, 
Chicago, August 1975. 
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Perceptions of Self-Managed Learning Opportunities —, 
and Academic Locus of Control: 
A Causal Interpretation 


Marshall Arlin 
University of British Columbia 
Vancouver, Canada 


Subjects were 566 students in Grades 5, 6, and 7. They were assessed at the 
beginning and the end of a school year on their perception of opportunities for 
self-management of their instruction and perception of academic locus of con- 
trol. Data were analyzed by a cross-lagged panel correlation technique. Per- 


Theodore W. Whitley 


East Carolina University 


ceptions of opportunity for self-management of instruction were causally prior 
(p < .05) to perceptions of academic locus of control. It is suggested that if 
students perceive the classroom as a place where they can in part manage their 
own instruction, then they are likely to accept responsibility for their academ- 1 


ic successes and especially for their academic failures. 


Academic locus of control may be an im- 
portant attitude variable in school settings. 
Students perceive the locus of control in a 
learning situation along a continuum from 
internal to external (Crandall, Katkovsky, 
& Crandall, 1965). Students who perceive 
an internal locus take responsibility for their 
own academic successes and failures. Stu- 
dents who perceive an external locus place 
responsibility for academic successes and 
failures upon external forces such as luck or 
the teacher ("I was lucky on the test”; “the 
teacher was unfair”), 

Perception of opportunities to manage 
one’s own learning may influence attribution 
of locus of control within school settings. It 
is Possible that students’ perceptions about 
opportunity to move about the room, to work 
at their own rate, and to choose their own 
activities will affect their locus of control. 
As students come to perceive that they help 
to determine the activities in which they 
engage, students may also become more 
willing to take personal responsibility for 
successes or failures that accrue to those 
activities. It may be easier to blame failures 
at a learning task upon the teacher (external) 


Requests for reprints should be sent to Theod. 
Whitley, Center for Educational [i caf 
Evaluation, Division of Health Affairs, East Carolina 
University, Greenville, North Carolina 27834. 
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if students perceive that the task is one thal 
has been forced upon them by the teacher, 
Conversely, it may be more difficult to ra- 
tionalize failures at a learning task in terms 
of external sources if students have partici- 
pated in choosing the task. A similar argu 
ment applies for willingness to ascribe su 
cess at academic tasks to oneself. If stus 
dents perceive that they have helped deter 
mine the tasks undertaken, then succe 
might be considered a result of their 
effort. On the other hand, if students 
ceive that tasks undertaken are determine 
primarily by the teacher, then they might ba} 
more likely to ascribe success to extern 
forces such as good luck. 
Although we have presented an argumel 
for a directional relationship leading froi 
perceptions of self-management to academi 
locus of control, we expect that the rel 
tionship is bidirectional. As students bi 
come more (or less) willing to accept 
sponsibility for their academic successes 
failures, they may become more (or | 
willing to perceive the classroom as pro 
sufficient opportunities to manage their 
instruction (to control their own academi 
destinies). The major purpose of the stud! 
was to determine whether or not one pattel 
of directionality predominates. In molt 
formal terms, we wished to see if the time 
lagged relationship between the two Vi 
ables was in part symmetrical, rather t 
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bidirectional, and if the alternate hypothesis 
€ spuriousness could be excluded. 
A second concern of the study was to ex- 
amine the effect of educational setting upon 
the hypothesized relationship between the 
two variables. Wang and Stiles (1976) found 
| that a learning-management program (a 

self-schedule system) significantly affected 
students’ perceptions of self-responsibility 
for their learning. Arlin (1975) found that 
classroom settings that allowed opportuni- 
ties for self-management of learning (“open” 
classrooms) were more conducive to positive 
attitudes than teacher-directed classrooms 
for students with an internal locus of control. 
It seemed possible that classroom settings 
that allowed for a considerable degree of 
self-managed learning might be conducive 
to a different directional relationship be- 
tween classroom perceptions and locus of 
control than the directional relationship in 

“traditional” classrooms. 

A third concern of the study was to ex- 
mine the effect of locus of control valence 
upon the hypothesized relationship between 
perceptions of management and locus of 
control. A positive valence refers to internal 
or external ascriptions of success, and a 
negative valence refers to internal or external 
ascriptions of failure. Crandall et al. (1965) 
dnd Weiner and Kukla (1970) have sug- 
gested that self-responsibility for success 
(positive) is independent of self-responsi- 
bility for failure (negative). It is thus pos- 
sible that the hypothesized directional re- 
lationship between perceptions of self- 
Management and locus of control may differ 
for positive versus negative ascriptions of 
locus of control. 

The following analysis was conducted to 
lest for a directional time-lagged relationship 
tween perceptions of opportunity for 
If-management and locus of control, and 
to see if this relationship differed for edu- 
tational setting and for locus of control va- 
ence, 


Method 


Subjects 


Subjects were 566 students in Grades 5 (n = 200), 6 
h= 204), and 7 (n = 162). Of these, 258 came from a 
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school that encouraged individualization and self- 
management of instruction, and 308 came from a school 
that placed more emphasis on teacher management of 
instruction. More specifically, the latter 308 students 
were enrolled in traditional self-contained classrooms 
in which there was no formalized method of indivi- 
dualizing instruction. Instruction in the former school 
was based upon skills continua in language arts and 
mathematics developed by the professional staff of the 
school system. Each continuum was composed of the 
skills that each student in the school system was ex- 
pected to acquire during his or her educational career, 
Each teacher was responsible for determining the skills 
that each student had acquired and for placing students 
at the appropriate point on each continuum. The 
students then progressed along the continua at their 
own rates and, with certain restrictions, according to 
their own interests. For example, although the stu- 
dents were expected to work on language arts at a par- 
ticular time, they worked on tasks appropriate to their 
level of skills acquisition and could shift to other ma- 
terial while they were waiting for the teacher to help 
them with problems or check their work, Both schools 
were located in small communities in the Appalachian 
region of North Carolina. Both communities were 
relatively stable, with pupil turnover rates of less than 
5% during the year. Members of minority races com- 
prised less than 10% of the student population in both 
schools. 


Instruments 


Academic locus of control was measured by a short 
form! of the Intellectual Achievement Responsibility 
(IAR) Scale, which was derived from the full scale de- 
veloped by Crandall et al. (1965). The short form dif- 
fers from the IAR in that it consists of 18 instead of 34 
forced-choice items such as, "When you do well on à test. 
at school, is it more likely to be (a) because you studied 
for it, or (b) because the test was especially easy?", ‘The 
respondent must check either a or b. Nine items assess 
negative IAR or willingness to accept personal respon- 
sibility for failure, and 9 items assess positive IAR or 
willingness to accept personal responsibility for success. 
Answers were scored with 0 for an external response and 
1 for an internal response, and all responses were com- 
bined to yield a total IAR (or internality) score. 

Perception of opportunities for self-management of 
learning (SML) was assessed by the Attitude Toward 
Learning Processes instrument (Arlin & Hills, 1976, 
Form A, elementary level)? This instrument contains 
15 items with a scaled response of no, sometimes, usu- 
ally, or yes to each. Typical item stems are the fol- 
lowing: “We get enough chances to choose our own 
activities in class”; “I have to spend too much time 
sitting at my desk”; “I have enough chances to work at 
my own speed”; and “I have enough chances to work on 


1 A copy of the IAR short form may be obtained from 


the second author. 
2This instrument was originally developed at the 
North Carolina Advancement School, Winston-Salem, 


North Carolina, in 1974. 
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special things that interest me.” The instrument was 
developed in a cartoon-pictorial format. The reliability 
of the instrument was reported as .90 on a norming 
sample of 14,000 students. 


Procedure and Analysis 


Both instruments were administered to each of the 
566 students at the beginning and at the end of one 
school year. The instruments were scored and analyzed 
by a cross-lagged panel correlation technique (Kenny, 
1975; Pelz & Andrews, 1964). According to this tech- 
nique, panel data can be analyzed to indicate which of 
two variables A and B, each measured at Time 1 and 
Time 2, is more likely to have causal priority over the 
other. If A determines B, rather than the reverse, then 
the cross-lagged correlation A;-B» should exceed B1-A». 
Unlike most correlation techniques which do not permit 
directional or causal analyses, the cross-lagged panel 
correlation technique does permit tentative causal in- 
ferences. 

The patterns for the cross-lagged and other correla- 
tions for the composite of both types of school settings 
and both locus of control valences are shown in Figure 
1. The cross-lagged correlations were tested by the 
Pearson-Filon test (Peters & Van Voorhis, 1940), as 
recommended by Kenny (1975). In the language of 
cross-lagged correlation analysis, the significant dif- 
ference that was found (z = 2.93, p < .01) allows rejec- 
tion of the hypothesis of spuriousness. The perception 
of self-management of learning was causally prior to 
perception of academic locus of control. Perception of 
self-managed learning predicted later locus of control 
more highly than locus of control predicted later per- 
ceptions of self-managed learning. _ 

For purposes of the secondary analyses, students were 
grouped in eight additional manners: by valence, by 
setting, and by the subgroups of valence and setting. 
The results of these additional groupings, plus the pri- 
mary grouping, are presented in Table 1. Three of the 
secondary tests were significant, and the other five 
followed the pattern of the composite test. The pat- 


Figure 1. Cross-lagged and other correlations between 
Intellectual Achievement Responsibility (IAR) and 
perceptions of self-managed learning opportunities 
(SML) at Time 1 and Time 2. (SML assessed by Atti- 
tude Toward Learning Processes instrument.) 
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terns appear to provide some indication of stationar 
since the synchronous correlations (both variables 
same time) do not appear to change. Synchronicity 
appears to have been satisfied, since both variabli 
measured at the same time. (Both stationari 
synchronicity are assumptions necessary for a cross 
lagged panel analysis.) x 


Discussion 


We agree with Kenny (1975) that er 
lagged panel correlation analysis is best 
in the exploratory stage of theory construc. 
tion. We will, accordingly, interpret our 
results in a tentative manner. The results 
of the present study should be viewed as one 
step beyond the identification of associatio 
between perceptions of self-manageme| 
and locus of control and one step prior} 
experimental test of the causal relationshi 
between the two variables. For subseque 
experimental tests, it must be remember 
that our focus has been upon perceptions 
opportunities for self-managed learni 
We agree with Lefcourt (1973) and Pet 
muter and Monty (1977) that perceptions 
control over one’s environment (or e 


learning, the experimental test would h 
been direct: experimentally manipula 
opportunities for self-managed learning ants 
observe the effects upon locus of control (d) 
Wang & Stiles, 1976). But opportunity f 

self-management is more easily treated 4 
dependent variable rather than an ind 
pendent classification variable. In 
Treatment X Levels design, students shou 
be classified by levels of management pe 
ception (e.g., high and low perceptions) pri 
to randomized assignment to treat 
(e.g., teacher-structured vs. student-st 
tured). It does not appear logical to 
perceptions of management opportuni 
prior to initial exposure to one or more of tl 
various potential treatments, none of whit 
has yet been experienced. Consequen tl 
prior to assignment there is no basis fo 
forming a perception of opportunities fo 
self-managed learning, and hence there is n4 
basis for categorizing students by levels 
Research continuing beyond the tentative 
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Table 1 ^ 

oss-Lagged and Other Correlations Between Intellectual Achievement Res ibility (I 
Self-Managed Learning (L) at Two Times for Nine Groupings 4 ele an 


— aiaaaaaaaaaaaasasaseasasasasesasesaesesasasasmsmsmsmsmsmsmsts 


Coefficients Pearson- 
Cross-lagged Stability Synchronous Filon 
Group n L-Lo; Lı- h- Li-L?  dH-Li Izl z test 
All students 566 
IAR positive 12 24 33 54 21 29 2.29** 
^ JAR negative 22 36 46 54 40 45 2.90* * 
j IAR total 21 35 46 54 38 43 2.93** 
\idividualized classrooms 258 
IAR positive 15 24 30 51 26 25 1.15 
IAR negative 23 39 43 51 46 48 2.31* 
IAR total 24 36 42 51 45 43 171 
Traditional classrooms 308 
TAR positive 13 19 35 48 19 19 83 
IAR negative 29 33 49 48 40 33 62 
IAR total 26 31 51 48 37 31 16 


Note. IAR = Intellectual Achievement Responsibility Scale. 
* p < 05. 
Ip«0. 


tests we have used will require experimental 
ingenuity. 

i Until such time as the causal hypothesis 
is put to experimental test, we feel that the 
results of this study are plausible and war- 
rant tentative acceptance. The results may 
at first glance appear to be based on small 
differences in correlations. But the cross- 
lagged differentials observed are by no 
means miniscule within the scope of the 
cross-lagged tradition. Asan example, ina 
recent careful use of the cross-lagged tech- 
nique, Atkin, Bray, Davison, Herzberger, 
Humphreys, and Selzer (1977) described 
their cross-lagged differentials of approxi- 
mately .13 as “large” and “more exciting" 
. than the small differentials typically ob- 
served in cross-lagged studies. 

We feel that it is reasonable to conclude 
that under certain conditions, perception of 
. Opportunities for self-managed learning is 
causally prior to academic locus of control. 
If pupils at the beginning of a school year 
Perceive that they are in a classroom in 
which self-management of their learning is 
encouraged, then they are likely to develop 
a greater willingness to accept responsibility 
for their academic success and failures. 
From a comparison of the subanalyses, we 
Would like to speculate about the specificity 
of this causal priority. It would appear that 


this pattern is more likely in classrooms that 
provide actual opportunities for self-man- 
agement (in this case classrooms attempting 
individualized programs). Although our 
two-wave design assessed perceptions at the 
beginning and the end of a school year, we 
would speculate that the causal priority is 
not limited to initial perceptions. We think 
that as students gradually come to realize 
that the teacher does encourage self-man- 
agement of learning, and that as they begin 
to experience some success at self-managed 
learning, their internal ascriptions of locus 
of control would increase. Similarly, as 
students find a self-managed program in- 
adequate, or experience failures in managing 
their learning, ascriptions of internal locus 
of control might decrease. 

It is interesting that the causal priority 
appears more powerful for the negative va- 
lence of locus than for the positive valence. 
When this result is combined with the ob- 
servation that the pattern appears stronger 
in classrooms that encourage self-manage- 
ment, the following hypothesis is suggested: 
In self-managed classrooms, as students 
gradually perceive that they have opportu- 
nities for choosing their own academic ac- 
tivities, they gradually take personal re- 
sponsibility for failures at these same ac- 
tivities. Perhaps a willingness to take re- 
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sponsibility for failure is more sensitive to 
perceptions of self-management opportu- 
nities than is the willingness to take re- 
sponsibility for academic successes. 

In both traditional and self-managed 
classrooms, students may find it equally easy 
to ascribe success to themselves, but they 
might not find it equally easy to ascribe 
failure to external sources. The plausible 
“excuses” might differ for the two classroom 
conditions. In a teacher-directed classroom, 
the rationalization that failure is due to the 
fault of the teacher or bad luck may be rela- 
tively easy for students to accept (to con- 
vince themselves). They can allege that the 
teacher has forced them into “stupid,” 
“dumb,” or meaningless activities so that 
failure is not their fault but the teacher’s and 
the teacher’s activities. However, students 
who have chosen their own activities may 
find it much more difficult or incongruous 
with their experience to blame failures upon 
external sources such as the teacher. Thus, 
our tentative hypothesis is that self-managed 
programs may alter perceptions of oppor- 
tunities for choosing one’s own learning, and 
that these perceptions may influence stu- 
dents to take responsibility for failures at 
those activities they believe they have cho- 
sen. We find this hypothesis intriguing for 
further research on the development of ac- 


ademic locus of control in classroom set- 
tings. 
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= Teachers’ Attitudes Toward Educational Goals 
as Reflected in Classroom Behavior 


. Liya Kremer 
University of Haifa, Haifa, Israel 


This study is concerned with some possible effects of the emerging trend to 
prefer progressive over traditional goals of education. The following were the 
variables studied: Attitudes toward educational goals; expectations of 
Y achieving educational goals; perceived knowledge about strategies and means 
intended to achieve goals; personality traits; teaching behaviors. Findings 
point to more congruity among variables in traditional than in the progressive 
domain, A phenomenon of *pseudoprogressivism" was reflected in the inter- 
action among attitudes, personality traits, and teaching behaviors. The study 
suggests implications for preservice and in-service teacher education. 


Progressive goals of education have been 
advocated since the beginning of the century, 
Dewey's (1902) ideology being an important. 
influence. Studies, however, indicate that 
traditional ways of teaching still dominate 
the educational scene (Dodl, 1966; Lamm, 
1973). 

This study is concerned with the gap be- 

| tween the importance attached to progres- 
sive goals in education, on the one hand, and 
lack of their implementation, on the other. 
There are two likely sources for this phe- 
‘nomenon: (a) the relative difficulty of 
teaching in a progressive manner; (b) the 
obstacle of personality traits at variance with 
‘progressive ways of teaching. Since the 
hypotheses of this study are derived from 
these two lines of thought, their further 
elaboration is in order. 

A traditional viewpoint of education has 
well-defined goals that are relatively easy to 
assess and allow for uniform ways of teach- 
ing. In contrast, progressive goals such as 
treativity, self-actualization, and sensitivity 

- re heuristic in nature and do not imply 
. Specific and clear-cut methods of teaching. 
Hence, teachers’ perception of ways and 
means to achieve these goals may be af- 
fected, and a gap between goals and knowl- 


The suggestions of John E. Hofman on an earlier 
draft of the article are greatly appreciated. 
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edge of relevant teaching strategies may 
arise. Asa result, teachers’ expectations of 
achieving progressive goals may be lower 
than their attitudes would suggest. 

Accordingly, two hypotheses of this study 
concern themselves with two kinds of 
gaps—between attitudes toward progressive 
goals and the knowledge about appropriate 
methods of teaching, on the one hand, and 
between these same attitudes and goal ex- 
pectations, on the other. 

As to the second problem, that is, per- 
sonality traits at variance with progressive 
teaching, it is hypothesized that behavior 
will be related to attitudes only if these are 
congruent with personality traits. The traits 
under study are open-mindedness and 
close-mindedness as measured by the Dog- 
matism Scale (Rokeach, 1960). Progressive 
and traditional attitudes may be shown to be 
consistent with open-mindedness and 
close-mindedness, respectively. Progressive 
attitudes and open-mindedness involve such 
elements as flexibility, permissiveness, and 
tolerance of ambiguities. Traditional atti- 
tudes and close-mindedness share elements 
of authoritarianism, hierarchy, and struc- 
ture. Thus, an open-minded teacher who 
professes progressive attitudes is likely to 
generate a permissive atmosphere, allowing 
for a flexible program of studies suitable for 
a wide range of individual differences, 
whereas a closed-minded teacher who pro- 
fesses traditional attitudes is apt to prefer a 
more structured program of studies condu- 
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cive to exertion of authority in a restricted 
atmosphere. However, this consistency does 
not occur in all cases, and as Rokeach (1960) 
claims, dogmatism may be entirely inde- 
pendent of attitudinal content. Thus, four 
groups of teachers are identified: progressive 
and open-minded; progressive and closed- 
minded; traditional and open-minded; tra- 
ditional and closed-minded. Only in cases 
where personality traits and attitudes incline 
toward congruency are attitudes hypothe- 
sized to relate to teaching behavior. 

In sum, two sets of hypotheses were tested. 
One deals with the gap between attitudes 
toward educational goals, the knowledge to 
teach in line with these goals, and expecta- 
tions of goal achievement. The other deals 
with the relation among attitudes, person- 
ality traits (closed-mindedness and open- 
mindedness), and classroom teaching be- 
havior, claiming that teaching behavior fol- 
lows attitudes only if these are congruent 
with personality traits. 


Method 
Population 


Questionnaires were sent to the entire population of 
teachers of Grades 3-6 in the northern and central re- 
gions of Israel. Teachers in the lower grades were ex- 
cluded, as their concern with basic skills might interfere 
with goal orientation, Also excluded were teachers who 
teach in schools for the disadvantaged, as their pupil 
population might influence expectations of achieve- 
ment. The sample in the study consisted of the 261 out 
of 300 teachers who returned entirely completed ques- 
tionnaires. 


Measures 


Questionnaires for assessing attitudes toward edu- 
cational goals were developed for progressive and tra- 
ditional goals on the basis of available literature 
(Bereiter, 1972; Bloom, 1956; Bruner, 1960; Dewey. 
1902; Dodl, 1966; French, 1957; Maslow, 1968; Rogers, 
1961, 1969; Suchman, 1961; Torrance, 1963). ; 

As a first step, 10 professors in the educational field 
were asked to classify a list of goals into four groups: (a) 
cognitive-traditional; (b) affective-traditional; (b) 
cognitive-progressive; (d) affective-progressive. Only 

goals on which 9 out of 10 judges agreed were included. 
The final list included 12 goals divided as follows: 6 
traditional goals—basic skills, training in specific and 
well-defined thinking skills, study habits, citizenship, 
attitudes toward work, and patriotism—and 6 pro- 
gressive goals —independent study, creativity, location 


LIYA KREMER 


of gaps and problems, tolerance, personal style and | 
self-actualization. 4 

Next, the list was also given to 40 elementary schoof 
teachers, half of whom were identified by peers and 
principals with progressive attitudes, the other half with 
traditional ones. Each of these groups preferred goals 
congenial to their point of view; differences in choice 
were in the expected direction and statistically signifi- 
cant. 

As a final step, the list of goals was cast into a 
forced-choice format in order to prevent a tendency on 
the part of subjects to attach equal importance to all , 
educationalgoals. Subjects were required to rank-order 
the educational goals (a) in terms of their attitudeg® 
toward the listed goals, (b) in terms of knowledge about 
how to achieve these goals, and (c) in terms of expec- 
tations of achieving the goals. Attitude, knowledge, 
and expectation scores were computed for traditional 
and progressive goals by adding the rank-order assigned 
to these goals. This was done separately for the cog- 
nitive and affective domain of goals. The score ofeach Y 
of these subgroups ranged from 6 to 15. The difference 
scores were also computed: one for the gap between 
attitude and knowledge scores, and the other between 
attitude and expectation scores. As each score could= 
range from 6 to 15, difference scores could vary from -9 
to +9. It should be noted that scores were given ona 
ranking basis; hence, they reflect differing emphasis and 
priority rather than exclusivity. 


b. 
Criterion Groups 


Three groups of teachers were identified with the help 
of Rokeach's (1960) D scale and the questionnaire on 
goals described above: (a) progressives, who scored in 
the upper third on open-mindedness tested by the | 
Dogmatism Scale and progressiveness measured by the” 
attitude questionnaire developed here; (b) tradi- 
tionalists, who scored in the upper third on closed- 
mindedness, that is, dogmatism, and traditionalism; ( 4 
pseudoprogressives, who scored high on closed-mit | 
dedness and on progressivism. Only the last of thes 
groups was hypothesized to behave inconsistently in the 
classroom, that is, to show behavior consistent witha 
traditional point of view despite professed progressive 
goal preferences. The fourth possible group, traditio 
teachers who are open-minded, was not studied, sinc 
the focus of this study, as was mentioned, was on pr 
gressive attitudes, which are perceived as socially de 
sirable. Teachers may tend to profess them as a res! 
of conformism, and when that is the case, teaching bes 
havior may not relate to attitudes. 


Classroom Observation 


Forty-eight teachers, randomly chosen from each of 
the criterion groups, were observed by trained observer? 
whose recorded protocol attained 92% agreement. ] 
Observer recorded 2 lessons per teacher, 96 lessons in all, 
given on 2 different days at different times of day. The 
unit of analysis was a content unit defined as an episode 
(Smith & Meux, 1970). Each episode was analyzed in 
terms of its relation to the cognitive goals listed in te 


questionnaire. It should be noted that goals in the af- 
fective domain were not analyzed because they cannot 
be operationally translated into teaching behaviors. 
Degree of agreement between two analyzers was 94%. 
The percent of traditional and progressive episodes out 
of the total was computed for each teacher. These 
values yielded a profile of traditional and progressive 
po behavior relative to the total behavior re- 
corded. 


Results 


Because of the forced-choice technique, 
gaps in the traditional and in the progressive 
domain are equal in size but opposite in di- 
rection—a gap score of 7 in one domain 
would imply a gap of —7 in the other. 
Therefore, as far as gaps are concerned, only 
findings in the progressive domain of goals 
are reported. 


Teachers' Attitudes, Knowledge, and 
Expectations 


A comparison between attitudes and rel- 
evant knowledge of teaching means and 
strategies, on the one hand, and expectations 
of goal achievement, on the other, yielded 
the following findings: 

Table 1 shows that rank differences be- 
_ tween attitudes and knowledge were signif- 
icant at the p < .001 level. Teachers ranked 
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Table 1 
Difference Scores Between Attitudes and Perceived Knowledge for Progressive Goals 
Attitudes Knowledge 
r Difference 
Domain M SD M SD score SE t 

Cognitive 15 2.03 13.2 2.39 —5.3 18 28.65* 

Affective 10.9 2.60 13.5 .32 —2.6 18 8.51* 
Note. N = 261; df = 260. 
*p <.001. 


higher on attitudes toward progressive ed- 
ucational goals than on their perceived 
knowledge to attain them in both cognitive 
and affective domains. 

Difference scores between attitudes and 
expectations were found to be significant at 
the p < .001 level in the cognitive as well as 
in the affective domains of goals. It seems 
that teachers do not expect to attain pro- 
gressive educational goals to the same extent 
that they express favorable attitudes toward 
them Tables 1 and 2 also show that gaps 
between attitudes and knowledge scores and 
between attitudes and expectations scores 
are larger in the cognitive than in the affec- 
tive domain. In other words, there is more 
congruity among variables in the affective 
than in the cognitive domain. 


Teacher Behavior in the Classroom 


The general impression that teacher be- 
havior tends to be more consistent with the 
traditional viewpoint receives support by the 
data reported in Table 3. The picture 
changes, however, when behavior is analyzed 
separately for the three groups of teachers. 
Now, significant differences are found in the 
hypothesized direction. 

Table 3 indicates clearly that progressives 


Table 2 
Difference Scores Between Attitudes and Expectations 
Attitudes Expectations 
Difference 
Domain M SD M SD score SE t 
Cognitive 79 2.03 12.8 2.66 -49 is x : s 
Affective 10.9 2.60 129 2.58 —2.0 y 5.38 


Note. N = 261; df = 260. 
*p < 001. 
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Table 3 


Percentage of Time Engaged in Traditional and Progressive Teaching Behaviors by Various » 


LIYA KREMER 


‘Teacher group 

All observed 4 

teachers Traditional Progressive Pseudoprogressive 
Behavior M SD SE M SD SE M SD SE M SD SE F 
Traditional 688 4.2 21 855 54 461 54 25 758 35 236 435* 
Progressive 31.1 3.9 2.5 145 61 26 53.9 6.1 2.9 24.2 3.9... 8.4. 866% 
Note. df = 2,45. 
*p € 01. 


and traditionalists behave consistently with 
their personality traits, but pseudoprog- 
ressives do not. This supports the hypoth- 
esis that personality traits and attitudes 
must be congruent to be reflected in teaching 
behavior. 
The extent to which teaching behavior is 
predictable from attitude content was de- 
termined by multiple regression analysis 
(Table 4). Results indicate that the belief. 
system, as tested by the Dogmatism Scale, 
when congruent with attitudes, accounts for 
5396 of the variance in traditional behavior 
and for 4896 in progressive behavior. Thus, 
teaching behavior can reliably be predicted 
by knowibde of personality traits and atti- 
udes. 


Discussion 
Findings tend to support the hypotheses 
of the study. First, teachers profess favor- 


able attitudes toward progressive educa- 
tional goals but have a relatively low per- 


Table 4 


Dogmatism and Attitudes as Predict 
Teaching Behavior dd 


Behavior Predictor R? B F 
Traditional Dogmatism 28 50  241* 
Traditional <53 — 49  931* 
attitudes 2 
Progressive Dogmatism +25 48 — 19.5* 
Progressive 48 AT 19,2" 
attitudes 
es 
Note. df = 2,45. 
*p € 01. 


ception of the knowledge conducive to goal 
attainment and low expectations of achiev- 
ingthem. Second, the consistency between 
attitudes and classroom behavior is indeed 
contingent on the congruence between atti- 
tudes and personality traits. Some impli- 
cations of these findings for teacher educa- 
tion follow: 

1. As mentioned above, the heuristic na- 
ture of progressive goals, requiring flexibil- 
ity, creativity, and ingenuity on the part of 
the teacher, fails to prescribe a single best 
way to attain these goals. Teachers without 
clear guidelines may well develop a sense of | 
helplessness, behind which they can hide in? 
order to justify traditional teaching behav- 
ior. P 
There is the additional possibility that 
teacher training does not equip students 
with the necessary understanding and tools 
Lectures about creativity, tolerance, all 
self-actualization are not effective enougl 
unless training works through these pro 
cesses (McLuhan, 1964). Even if progres 
sive attitudes are developed in the course 0 
teacher training, these are not necessarily] 
followed by appropriate behavior. l 

Teacher educators should be aware 0 
these pitfalls and consider them durin 
curriculum planning. Teaching in method 
courses should follow strategies that these 
courses advocate. They must develo 
awareness of the heuristic nature of P 
gressive teaching and foster autonomy 
creativity so that student teachers will no 
expect detailed prescriptions on how 
teach. 
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2. Teachers’ expectations are thought to 
^ have an effect on pupil achievement 
(Rosenthal & Jacobson, 1968), hence teacher 
educators should be wary of generating low 
expectations with regard to progressive 
goals. The teacher-education curriculum 
should aim at teacher trainees' awareness of 
and sensitivity to pupil potential and 
strengthen their faith in the prospect of 
achieving progressive goals, thus raising the 
level of expectations. 

3. The larger gaps between attitudes and 
knowledge and between attitudes and ex- 
pectations in the cognitive as compared to 
the affective domain may find their expla- 
nation in the perception of cognitive devel- 
opment as being more useful to pupils living 
in a technological era. Criteria for success 
in life have more cognitive than affective 
« connotations. Also, disappointment with 

the outcomes of permissiveness of which the 

progressive movement is sometimes accused 

may have encouraged teachers to return to 
* traditional ways in the classroom. These 
and other possible reasons for large dis- 
.. crepancies among attitudes, knowledge, and 
expectations should motivate educators to 
search for causes and remedies and to discuss 
them with student teachers. 

4. Depending on one's school of thought, 
two alternatives suggest themselves as a re- 
sulfvof findings concerning the attitude- 
ehavior link: (a) Teacher training could 
cus on behavior modification and bypass 
: problem of attitude change. Since 

her behavior does not relate to attitudes 
en these are not consistent with person- 
ty traits, there is no good reason for 
thering with attitude change. It may well 
more expedient to train student-teachers 
specific behaviors and let attitude change 
ccur as the result of behavior change (Janis 

King, 1954). (b) Teacher training might 
proceed in accordance with the progressive 
goal of awareness of individual differences 
without necessarily advocating progressive 
goals in general. If both traditional and 
progressive ways are perceived as legitimate, 
teachers will feel free of pressure and the 
need to conform to attitudes considered so- 
cially desirable. Asa result, the phenome- 
non of “pseudoism” will be avoided. 
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Summary 


This study dealt with the gap between 
teachers’ attitudes toward educational pro- 
gressive goals and the lack of their imple- 
mentation in the classroom, Findings in- 
dicate two sources for the discrepancy: (a) 
a perception of insufficient knowledge about 
how to implement progressive goals in the 
classroom and low expectations of achieve- 
ment; (b) personality traits incompatible 
with progressive teaching behavior. Some 
suggestions for teacher education and in- 
service training have been put forward. 


References 


Bereiter, C. Moral alternatives to education. Inter- 
change, 1972,3, 25-41, 

Bloom, B. S. Taxonomy of educational objectives. 
New-York: Longmans Green, 1956. 

Bruner, J. S. The process of education. Cambridge, 
Mass.: Harvard University Press, 1960. 

Dewey, J. Democracy and education. New York: 
MacMillan, 1902. 

Dodl, N. R. Pupils’ questioning behavior in the context 
of classroom interaction. Unpublished doctoral 
dissertation, Stanford University, 1966. 

French, W., Behavioral goals of general education in 
high school. New York: Russel Sage Foundation, 
1957. 

Janis, I. L., & King, B. T. The influence of role playing 
on opinion change. Journal of Abnormal Social 
Psychology, 1954, 49, 211-218. 

Lamm, Z. Contradictory logics in teaching. Tel-Aviv, 
Israel: Sifriat Poalim Ltd., 1973. 

Maslow, A. H., Some educational implications of the 
humanistic psychologies. Harvard Educational 
Review, 1968, 38, 685-696. 

McLuhan, M. Understanding media, New York: 
McGraw-Hill, 1964. 

Rogers, C. Freedom to learn. La Jolla, Calif.: Center 
for the Studies of the Person, 1969. 

Rogers, C. On becoming a person. Boston: Houghton 
Mifflin, 1961. 

Rokeach, M. The open and closed mind. New York: 
Basic Books, 1960. 

Rosenthal, R., & Jacobson, L., Pygmalion in the class- 
room. New York: Holt, Rinehart & Winston, 


1968. 

Smith, B. O., & Meux N. 0. A study of the logic of 
teaching. Urbana: University of Illinois Press, 
970. 

ge J. R. Inquiry training: Building skills for 
autonomous discovery. Merill-Palmer Quarterly, 
1961, 7, 147-169. 

Torrance, E. P. Guiding the creative talent. Engle- 
wood Cliffs, N. J.: Prentice-Hall, 1963. 


Received December 29, 1977 
Revision received May 11, 1978 m 


Journal of Educational Psychology 
1978, Vol. 70, No. 6, 998-1009 


Student Characteristics, Classroom Processes, t 
and Student Achievement 
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A comparison between two different approaches to apportioning variance in 


posttest achievement scores was made with data on 29 lessons on one social 
studies topic. One method led to exaggerated estimates of the contribution 
of student characteristics through ignoring joint contributions of student 


characteristics and process variables. Comparisons were also made among 
three different criterion measures in exploring process—product correlations. 
Residualized achievement scores using, first, the individual student and, sec- 
ond, the class as the statistical unit were used, as well as unadjusted achieve- 


ment scores. The same set of process variables tended to correlate signifi- 
cantly with all three product measures, though some differences were noted. f 


Relationships among student characteristics, process variables, and student 
achievement indicated possible teacher expectancy effects. 


One issue affecting process-product re- 
search on teaching is the way in which dif- 
ferences in such student characteristics as 
general ability and prior knowledge are 
taken into account in discerning the effects 
on student achievement of the classroom 
processes under observation. Pedhazur 
(1975) discusses three approaches to vari- 
ance partitioning in nonexperimental re- 
search, They are (a) incremental parti- 
tioning of variance, in which variables are 
entered in order and the additional propor- 
tion of criterion variance accounted for by 
each entry is assessed; (b) partitioning re- 
sidual criterion variance, in which the cri- 
terion variable is first residualized on se- 
lected control variables and then the residual 
criterion variance is partitioned among the 
treatment variables; and (c) commonality 
analysis, in which criterion variance is par- 
ti tioned into proportions attributable to all 
combinations of predictors and to unique 
contributions of lone predictors. 

From Rosenshine’s (1971) description, 
partitioning residual criterion variance (b, 
above) has been the most common procedure 


This study was supported by the Australian Advisory 
Committee for Research and Development in Educa- 
tion. 

Requests for reprints should be sent to Michael 
Dunkin, School of Education, Macquarie University, 
North Ryde, New South Wales, Australia 2113. ; 


Copyright 1978 by the American Psychological Association, Inc, 0022-0663/78/7006-0998$00.75 


998 


in process-product research on teaching. 
"That is, posttest scores have been adjusted 
for differences in student characteristics, 
such as general ability and prior knowledge, 
and the challenge has been to account for the 
variance in residual scores with process 
variables. One of the assumptions of this 
procedure is that the student characteristics. 
concerned are independent of process vari- 
ables in their contribution to variance in... 
student achievement. However, that 
sumption is not necessarily valid, and 
Baumgart (1977) notes, there can be seri¢ 
problems in the use of the above varia 
partitioning techniques when predic 
variables are highly correlated. 
Figure 1 illustrates a configuration of 
lationships among student characteristit 
process variables, and student achievemel 
for which procedure b above would seem le 
appropriate than procedure c. Here studell 
characteristics and process variables are in 
tercorrelated in a way that affects their re 
lationships with student achievement. Ai e 
x represents the proportion of variance iñ 
student achievement uniquely accounted fo 
by differences in student characteristics o 
and above that accounted for by process 
variables. Area y represents the proportion 
of variance in student achievement uniquely 
accounted for by differences in process 
variables beyond that accounted for by stu- 
dent characteristics. Area z represents the 
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B Student Characteristics 


A 
Student 
bg Achievement 


C Process Variables 


Figure 1, Model of variance in student achievement uniquely and jointly attributable to student 


characteristics and process variables. 


(Area x represents the proportion of variance in student 


achievement uniquely accounted for by differences in student characteristics over and above that ac- 
counted for by process variables. Area y represents the proportion of variance in student achievement 
uniquely accounted for by differences in process variables beyond that accounted for by student char- 
acteristics. Area z represents the proportion of variance in student achievement accounted for jointly 


proportion of variance in student achieve- 
ment accounted for jointly by student 
characteristics and process variables. 
The application of procedure b in the sit- 
. uation represented in Figure 1 would result 
in both areas x and z being attributed to the 
influence of student characteristics only. 
* Area z, representing the joint contribution 
of student characteristics and process vari- 
es, would remain unidentified, and pro- 
& variables could be seen to have influence 
y through their unique contribution 
resented in area y. In view of these types 
roblems, Cooley and Lohnes (1976) argue 
the procedure represented in Pedhazur's 
975) third type of approach, that of com- 
onality analysis, in which confounded ef- 
ects are identified and described along with 
nique contributions of predictors. One of 
he purposes of the present study was, then, 
to compare the results of applying both 
Pedhazur's procedure b and procedure c to 
the same set of data. 

Related to the considerations above is the 
question of which of a number of alternative 
criterion measures of student achievement 
is the most appropriate to use in exploring 
Drocess-product correlations. The usual 
Procedure, according to Rosenshine (1971), 
has been to use residualized achievement 


by student characteristics and process variables.) 


scores as the criterion or product variable. 
However, there are at least two ways of ar- 
riving at these represented in'the research 
literature. One way is illustrated in the 
Wright and Nuthall (1970) study, which 
used multiple regression analysis to deter- 
mine the relationship of intelligence and 
prior knowledge to achievement test scores 
for individual students, and then, from ap- 
plication of the multiple regression formula, 
a predicted posttest score and a residual 
(actual minus predicted) posttest score were 
calculated for each individual student. 
Class means of these individual residual 
scores were calculated and used as criterion 
scores in explorations of process-product 
relationships. 

The statistical unit in the procedure used 
by Wright and Nuthall (1970) was the indi- 
vidual student. It could be argued, however, 
that since the sampling unit in their study 
was the teacher or the class and not the in- 
dividual student, the class should have been 
the statistical unit in the calculation of cri- 
terion achievement measures. Had this 
approach been used, only class means for 
intelligence and prior knowledge would have 
been used in the multiple regression analysis, 
and the multiple regression formula would 
have been applied to produce single pre- 
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dicted posttest scores and single residual 
posttest scores for each class. The latter 
scores would then have been used as the 
criterion scores in investigating process— 
product relationships. 

Finally, it has been argued (Carver, 1969; 
Cronbach and Furby, 1970) that the most 
appropriate criterion measure of student 
achievement for the type of research re- 
ported here is the mean class unadjusted 
posttest score, arrived at simply by summing 
the individual raw posttest scores within 
each class and dividing the sum by the 
number of students tested in that class. 

A second purpose of the present study was 
to compare the use of all three of the above 
types of criterion measures of student 
achievement in exploring process-product 
relationships. Student characteristics of 
intelligence and prior knowledge have often 
been used in previous classroom research as 
contributors to variance in posttest 
achievement scores, as bases for adjusting 
those scores, and as correlates with process 
variables. However, it is usually the case 
that when variance in posttest achievement 
is apportioned between student. intelligence 

and/or prior knowledge and process vari- 
ables, substantial proportions of the variance 
remain unaccounted for. For example, 
Wright and Nuthall (1970) found that in- 

telligence and prior knowledge together ac- 
counted for 3096 of the variance in achieve- 
ment test scores and that of the remaining 
variance differences between classes ac- 
counted for approximately 14%. Thus, a 
considerable proportion of the variance re- 
mained unaccounted for. In view of the 
possibility that student characteristics other 
than intelligence and prior knowledge might 
operate independently of them to produce 
differences in student achievement, two 
other student characteristics were re- 
searched in this study. They were student 
anxiety and student. dogmatism. 

Another important purpose of this study 
was to investigate a suggestion made by 
Rosenshine and Furst (1971). After re- 
viewing process-product research employing 
"student opportunity to learn" as a variable 
and concluding that it was “a consistent and 
significant correlate of student achievement" 
(p. 62), Rosenshine and Furst suggested that 
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future research determine the relevance of 
the classroom activities under observation « í 
to the criterion tests of student achievement. 
Such measures of relevance have been used 
as covariates to further adjust posttest scores 
before searching for teacher behaviors cor- 
related with the adjusted posttest measures, 
but Rosenshine and Furst recommended 
that they “also be used as a correlate of 
achievement, as a relevant teacher behavior 
contributing to student achievement" (p. 
63). Inthis study relevance was used in that 
way. 
Acceptance of the value of using relevance | 
as a potential correlate with student ` 
achievement brought with it the possibility 
that the effectiveness of teaching behaviors : 
is contingent upon their relevance. Forex- 
ample, teacher questioning might depend for 

its effectiveness on the extent to which the 
student responses solicited are relevant to 
the criterion test of student achievement. 
"Therefore, several of the process variables 
employed in this study were coded on the 
basis of their relevance to the criterion test 

of student achievement. 


Method 


Subjects 


they belonged, their age, years of experience, and se 
In order to ensure variation in teaching behavioi 
ly to be employed in the stud 


ion Analysis Categories (FIAC) 
was used as the Screening instrument, and on the basis 
of measures commonly yielded from FIAC, 29 teachers 
exhibiting as wide a range of the observed teaching be- 
pee as possible were selected for the study prop- 
The classes involved in the investigation were those 
usually taught by the selected teachers. In all, 1,016 
students attended the lessons analyzed in the study, but 
because of absences from testing sessions or transfers 
to the class after initial test data were gathered, com- 
plete test data were gathered for only 827 of them. 


Content of Lessons 


In order to minimize the potential effects of specific 
prior knowledge of the content of the lessons, the 
teachers were asked to teach one lesson on a topic un- 
likely to be familiar to either the teachers or the stu- 
dents. The topic was the stone money of Yap, which 
was chosen also because it was compatible with the 
| general social studies curriculum guidelines existing for 

- that grade level in the New South Wales school systems. 
Teachers were asked to limit lesson length to 30 minutes 
- to conform to the regular timetabled length of social 
# studies lessons. They were also asked to plan lessons 
of the discussion type. Each teacher was supplied with 
an information brochure on the topic 1 week in advance 
of the lesson and was asked to confine the content of the 
lesson to that contained in the brochure. 


Process Variables 


Six main types of process variables were selected on 
the basis of major reviews of classroom behavior re- 
® search literature (Dunkin & Biddle, 1974; Rosenshine, 
1971). These were content coverage, vagueness, teacher 
structuring, teacher solicitations, pupil responses, and 
teacher reactions, the last four being defined according 
to Bellack, Hyman, Smith, and Kliebard (1966). 
Within each of these at least two variables were coded 
as follows: 

Content coverage. Variable 1—the number of dif- 
ferent posttest items for which relevant information 
occurred at least once during the lesson; Variable 
2—the number of different posttest items for which 
relevant information occurred at least twice during the 
lesson, that is, item repetition. 

Vagueness. Variable 3—the number of times 
teacher talk contained inclusions in the list of most 
frequently occurring vagueness responses compiled by 
filler, Fisher, and Kaess (1969, p. 671); Variable 4—the 
nber of times pupil talk contained inclusions in the 

list; Variable 5—the ratio of Variable 3 to the sum 
ariables 1 and 2; Variable 6—the ratio of Variable 
) the sum of Variables 1 and 2. 

Teacher structuring. Variable 7—the number of 
fferent posttest items for which relevant information 
curred at least once in teacher structuring moves; 
ariable 8—the number of different posttest items for 
hich relevant information occurred at least twice in 

her structuring moves. 

Teacher soliciting. Variable 9—the number of 
eacher solicitations relevant to posttest items; Variable 
0—the number of higher order (those involving ex- 
plaining, evaluating, classifying, comparing and con- 
trasting, and conditional inferring) teacher solicitations 
elevant to posttest items; Variable 11—the number of 
lower order (those involving describing, defining, stat- 
‘ing, reporting, opining, and designating) teacher solic- 
itations relevant to posttest items; Variable 12—the 
ratio of Variable 10 to Variable 9. 

Pupil responding. Variable 13—the number of 
different posttest items for which relevant information 
occurred at least once in pupil responses; Variable 
14—the number of times pupils participated in inter- 


STUDENT CHARACTERISTICS 


1001 


actions that contained infcrmation relevant to posttest 
items divided by class size, that is, the mean number of 
times each pupil engaged in those types of interactions; 
Variable 15—the percentage of pupils who participated 
verbally at least once during the lesson. 

Teacher reacting. Variable 16—the number of 
positive teacher reactions to posttest item-related pupil 
responses; Variable 17—the number of negative teacher 
reactions to posttest item-related pupil responses; 
Variable 18—Variable 16 as a percentage of the total 
number of teacher reactions. 

Length of lesson. Variable 19—the number of 10-sec 
units of lesson duration, 

The only other classroom variable employed in the 
study was class size (Variable 20), that is, the number 
of students in each class who were present when the 
lesson was given. 


Student Characteristics 


Four instruments were used to measure student 
characteristics prior to the lessons. These were as fol- 
lows: 

Variable 21—the ACER (Australian Council for 
Educational Research) Intermediate Test D, a test of 
general ability used by the New South Wales Depart- 
ment of Education. In schools where recent scores on 
the test were available as a consequence of official ed- 
ucation department testing, the test was not readmin- 
istered. It was administered as part of this project only 
in those schools where recent scores were not avail- 
able. 

Variable 22—ISKIT (Identification of Skills in 
Teaching) Parts One and Two were modified versions 
of the Sequential Tests of Educational Progress (STEP) 
in Social Studies, Form 4A (1957). This test was de- 
signed to measure (a) skills—for example, identifying 
generalizations, making inferences, interpreting infor- 
mation, application of principles—and (b) knowledge 
of specific subject matter. Modifications of two types 
were made by replacing certain words and forms of 
spelling judged unfamiliar to Australian primary school 
children (such as apartment house, ball player, grain 
elevator, traveled) and by replacing unfamiliar curric- 
ulum content, especially that which required a partic- 
ular knowledge of North America. Only the combined 
scores for the two parts were used in this study. 

Variable 23—“How You Feel” was a slightly modi- 
fied version of the Test Anxiety Scale for Children 
(TASC) developed by Sarason, Davidson, Lighthall, 
Waite, and Ruebush (1960). Some of the modifications 
were those made by Gaudry (Note 1) in adapting the 
test for Australian conditions. Others involved simple 
substitutions of terminology to better reflect Australian 


e. 
Variable 24—"What Do You Think?" was a modified 
version of Figert's (1968) “What Do You Think?", which 
was an adaptation of the Dogmatism Scale (Rokeach, 
1960) for use in elementary schools. Again some 
changes of terminology were made better to approxi- 
mate local usage. In addition, attempts were made to 
clarify some explanations of response alternatives and 
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the method by which respondents record their 
choices. 


Student Achievement 


The last test instrument was the lesson-specific 
posttest on the Stone Money of Yap. This was a 54- 
item objective test with three response alternatives for 
each item. Items were designed to measure success in 
application of the categories of logical operations pre- 
sented by Smith and Meux (1962). 

An original pool of 80 items was tried out with a 
sample of two Grade 6 classes (N = 74 students) not 
otherwise involved in the study. Both classes were 
given a standard tape-recorded presentation of the 
content of the brochure on the Stone Money of Yap and 
were then asked to answer the 80 test items. Conven- 
tional procedures of item analysis were applied, and 26 
items were rejected. The remaining 54 items, some of 
which were modified on the basis of the trial, made up 
the final form of the test, for which the Cronbach alpha 
coefficient was .69. There were 28 lower order items 
involving describing, designating, defining, stating, and 
reporting and 26 higher order items involving classi- 
fying, explaining, conditional inferring, and comparing 
and contrasting. 

Data gathered with this test were used to arrive at 
three alternative measures of student achievement, as 
follows: 

Variable 25—Unadjusted achievement, that is, the 
mean for each class of the raw Scores obtained by indi- 
vidual students in that class, 

Variable 26—Residual achievement A, that is, the 
mean for each class of the residual Scores obtained for 
individual students in that class, 


Procedures 


Since the lesson on the Stone Money of Yap occurred 
aftera series of two taught by the teachers for the larger 
research Project (Doenau, 1977) to which this study was 
an addition, data concerning student characteristics 


ables described above, 
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Results 


Attributing Variance in Posttest Scores 


As stated above, one of the purposes of the 
study was to compare the results of applying 
two different procedures described by 
Pedhazur (1975) for partitioning variance in 
criterion variables in nonexperimental re- 
search, The first of these procedures in- s 
volved partitioning residual criterion vari- s 
ance where the criterion variable was first 
residualized on selected control variables 
and then the residual criterion variance was 
partitioned among the predictor variables. 

Multiple regression analysis was used to 
determine the relationships of the class . 
mean scores for the student characteristics 
measured in this study (see Table 1) to the 
class mean unadjusted achievement scores. 
From application of the multiple regression * 
formula, a predicted mean posttest score and 
a residual (actual minus predicted) mean 
posttest score were calculated for each of the - 
29 classes. Once the effect of the ACER 
Intermediate D variable (Variable 21) was 
accounted for, none of the other student 
characteristics contributed significantly to 
the variance in the criterion. Since the 
correlation between class mean ACER In- 
termediate D scores and the class mean cri- = 
terion test scores was -76, the student char- 
acteristics measured accounted for an- 
proximately 58% of the variance in m 
class posttest scores.1 ‘ 

A second multiple regression analysis 
conducted with process variables only 
independent variables and with residi 
achievement B Scores (Variable 27) as, 
dependent variable. 
together yielded a 
coefficient of .58 and thus accounted fo 
approximately 33% of the variance in th 
residual scores, i 

In summary, the application of this pro- 
cedure for partitioning variance suggested 
that the student characteristics measure 


A) 


counted for 58% of the variance in mean 
ss posttest scores, that process variables 
counted for 33% of the remaining variance 
r approximately 14% of the original vari- 
ce, and therefore, that a total of about 72% 
f the original variance was accounted for by 
both the student characteristics and the 
rocess variables. 

The other procedure described by 
edhazur (1975) and investigated here was 
e one involving commonality analysis, in 
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which criterion variance was partitioned into 
proportions attributable to all combinations 
of predictors and to unique contributions of 
predictors. Again multiple regression 
analysis was applied with unadjusted class 
mean posttest scores as the dependent 
variable. In the first stage both student 
characteristics and process variables were 
used as independent variables in order to 
obtain an estimate of the total proportion of 
variance accounted for by both sets of vari- 


Class Means and Standard Deviations of Variables Employed 
Variables M SD 
Content coverage 
1. Test item coverage 25.34 5.13 
2. Test item repetition 8.03 447 
Vagueness 
3. No. vague terms by teacher 81.86 38.06 
4. No. vague terms by student 21.76 21.82 
5. Ratio Variable 3 to Variable 1 -- Variable 2 2.67 131 
6. Ratio Variable 4 to Variable 1 + Variable 2 67 NU 
Teacher structuring 
7. Test items covered in teacher structuring moves 21.38 6.08 
8. Test items repeated in teacher structuring moves 4.24 2.86 
Teacher soliciting 
9. No. of relevant teacher solicitations 9.41 7.25 
10. No. of relevant higher order teacher solicitations 2.72 3.22 
11. No. of relevant lower order teacher solicitations 6.69 5.31 
* 12. Ratio of Variable 10 to Variable 9 25 21 
Pupil responding 
Test items covered in student responses 8.48 5.32 
| Mo. of relevant responses per student 43 .80 
"Percentage of students participating 59.97 22.53 
er reacting 
? No. of positive teacher reactions to relevant responses 8.31 6.65 
_ No. of negative teacher reactions to relevant responses 1.55 1.76 
- Variable 16 as percentage of total teacher reactions 15.66 11.07 
rh of lesson 
- Ho. of 10 sec of lesson duration 168.76 49.89 
size 
No. of students present for lesson 35.03 6.49 
eral ability 
21. ACER Intermediate D 108.26 745 
or knowledge d Ps 
23. Modified TASC 1345 2.63 
gmatism 
-24. “What do you think?” 105.85 3.40 
Student achievement 
25. Unadjusted achievement 26.28 3.17 
26. Residual achievement A is 20 


27. Residual achievement B 


ote. ACER — 
ISKIT = Identification of Skills in Teaching. 
TASC = Test Anxiety Scale for Children. 


ustralian Council for Educational Research. 
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ables. A multiple correlation coefficient of 
.89 was obtained, suggesting that 79% of the 
variance in the criterion variable had been 
accounted for. Next, multiple regression 
analysis was again employed with only pro- 
cess variables as the independent variables 
and with the unadjusted class mean posttest 
scores again as the dependent variable. This 
time a multiple correlation coefficient of .75 
was obtained, suggesting that process vari- 
ables accounted for about 56% of the vari- 
ance. Since it was already known that stu- 
dent characteristics accounted for about 58% 
of the variance, it was now possible to arrive 
at estimates for the proportions of variance 
in student achievement accounted for in 
each of the areas of Figure 1. 

In terms of the model illustrated in Figure 
1, these estimates led to the conclusion that 
areas x, y, and z together accounted for 79% 
of A, that x and z together accounted for 
58%, and that areas y and z together ac- 
counted for 56%. Thus, areas x, y, and z 
accounted for 23%, 21%, and 35%, respec- 
tively. 

When the results of the two procedures for 
apportioning variance were compared, it was 
found that the two approaches led to some- 
what different estimates of the total pro- 
portion of variance accounted for by student 
characteristics and Process variables (72% 
and 79%); to quite different estimates of the 
unique contribution of process variables 
(14% and 21%); and to similar estimates of 
the proportion of variance attributed to 
student characteristics (58% and 58%). 
However, application of the second proce- 
dure suggested that this last estimate was 
really made up of two components, the pro- 
Portion attributable to the unique contri- 
bution of student characteristics (23%) and 
the proportion attributable jointly to process 
variables and student characteristics (35%). 
In other words, reliance on the first proce- 
dure alone would have led to an overesti- 
mation of the importance of student char- 
acteristics and to failure to recognize the 
importance of the joint contribution of stu- 
dent characteristics and Process variables, 
Further discussion of the ways in which that 


joint contribution might have occurred is to 
be found below. 
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Context—Process Correlations 


Table 2 shows that all the student char. 
acteristics variables (Variables 21, 22, 23, an | 
24) correlated substantially with content 
coverage (Variable 1) and with positive 
teacher reactions to relevant student re- 
sponses (Variables 16 and 18). Further- 
more, student general ability (Variable 21), 
prior knowledge (Variable 22), and dogma- 
tism (Variable 24) were substantially corres: 
lated with teacher vagueness (Variables 3% 
and 5), and pupil anxiety (Variable 23) was 
similarly correlated with content coverage in 
teachers’ structuring moves (Variable 1). 
Correlations of this kind and magnitude are 
consistent with the previous finding that 35% . 
of variance in unadjusted achievement is 
represented in area z of the model. 


Process-Process Correlations ^ 


Approximately 4496 of the 171 correlation 
coefficients among the process variables 
measured in this study were .31 or above (ps 
X.10). Thus, when teachers asked relevant 
questions they tended to receive relevant 
responses to which they tended to react 
positively. Similarly, since much of the 
content of the lessons was unfamiliar factual 
material about the island of Yap and its 
money, content was mediated principally". 
teacher informin; 


Table 2 


Intercorrelations Among Student Characteristics, Process Variables, and Student Achievement Variables (29 Classes) 


Variables 2-159 4 
1, Content coverage 60 -08 08 
2. Content repetition 25 14 
3: No. vague terms by teacher 26 
4. No. vague terms by pupils. 
5: Ratio teacher vagueness 


Ratio pupil vagueness 

1 Teacher structuring-coverage 

8, Teacher structuring-repetition 

9, Relevant teacher solicitations 
10. Relevant higher order teacher solicitations 
11. Relevant lower order teacher solicitations 
12. Ratio higher order teacher solicitations 
13. Kopi responses-coverage 
14. Relevant responses per pupil 

15. Percent pupils! participating 

16. Relevant positive teacher reactions 

17. Relevant negative teacher reactions 

18. Percent relevant positive teacher reactions 
19. Length of of lesson 


20. Ci 

n ARR "Intermediate D 
23. Anxiety. 

24. Dogmatism 

25. Unadjusted achievement. 
26. Residual achievement A 
27. Residual achievement B 


SOLLSRISLLOVHVHO LNAANLS 


Note. Decimal points omitted. ACER = Australian Council for Educational Research; ISKIT = Identification of Skills in Teaching. For N = 29, p = .10 when r = .31, p = 


.05 when r = .36, p = .01 when r = .46, 
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teachers of lower ability classes prepared 
their lessons less well, thus covering less 
content and with less clarity? Alternatively, 
it could be that in recognition of the ability 
of their classes, teachers varied in their at- 
tempts to cover content on pedagogical 
grounds. However, this latter possibility 
would not seem to account for the accom- 
panying variations in teacher vagueness. 


Process—Product Correlations 


As mentioned earlier in this article, three 
measures of student achievement were used 
in exploring process—product correlations in 
this study. One of those was the class mean 
unadjusted posttest score (Variable 25). 
Another was the class residual acievement 
score (residual achievement B, Variable 27 
in Tables 1 and 2), calculated using the class 
as the statistical unit. The third was the 
class mean residual achievement score (re- 
sidual achievement A, Variable 26 in Tables 
1 and 3), calculated using the individual 
student as the statistical unit. 

It can be seen from Table 2 that four pro- 
cess variables correlated significantly (p « 
.05) with at least two of the achievement 
criteria. These were content coverage 

(Variable 1), teacher structuring-coverage 
(Variable 7), relevant responses per pupil 
(Variable 14), and the percent of relevant 
positive teacher reactions (Variable 16). In 
addition, two other process variables corre- 
lated significantly with at least one of the 
criterion achievement measures, These 
were the ratio of teacher vagueness (Variable 
5) and the ratio of relevant higher order so- 
licitations (Variable 12). 

The correlations between content cover- 
age and all three measures of student 
achievement, on the one hand, and between 
the ratio of teacher vagueness and unad- 
justed student achievement, on the other, are 
especially salient in view of the possibility of 
teacher expectancy effects discussed above. 
If there is validity in the speculations, then 
there could well be evidence here that the 
prophecies are self-fulfilling. 

; Another interesting set of process-product 
findings concerns the performance of ratio 
measures in comparison with absolute 
measures as correlates of student achieve- 
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| 
ment. Only 2 of the 13 absolute process | 
variables used in this study correlated sj 
nificantly with a student achievement vari. 
able. They were content coverage and 
teacher structuring-coverage, themselves 
highly interrelated. In comparison, four of 
the six ratio measures correlated signifi: 
cantly with at least one of the student 
achievement variables. 
N 
Discussion i 


Although the correlational design of this 
study does not lend itself to conclusions of | 
the cause and effect type, attempts to ex- 
plain the results achieved necessarily involve 
causal reasoning. These attempts should be 
accepted as hypotheses for the guidance of 
subsequent research. It is to be hoped that 
the evidence provided in this study is at least 
consistent with, if not demonstrative of, the 
validity of those hypotheses. f 

The model portrayed in Figure 1 implies | 
that there are two main sources of influence. 
upon the type of student achievement mea- 
sured in this study: student characteristics 
that determine capacity to benefit from in- | 
struction, and the instruction itself. The l 
model also implies that those two influences 
are not independent of each other and that 
their interrelationships will also influenc? 
student achievement. 

Although the evidence presented in the 
study indicates that on the basis of studen | 
characteristics alone, considerable succes 
could have been achieved in predicting sti- 
dent achievement, such achievement could 
not thereby be explained, for it is very ull: 
likely that even the most able students coul 

ave answered correctly test items about 
topic as unusual as the Stone Money of Yap. 
except by guessing. Clearly then, exposure 
to the specific content was required, andit 
this study that exposure occurred during the 


vi 


„tween two instructed groups of different 
"sbility. 

"But in this study, and in the real world of 
the classroom, instruction is not merely 
present or absent. It exists in varying 
amounts and with varying qualities. The 
evidence provided in this study is consistent 
with the argument that variations in student 
achievement occurred also because of vari- 
ations in such teaching behaviors as content 
coverage, higher order solicitations, vague- 
ness, and positive teacher reactions. Thus, 
the explanatory model and experiments to 
validate it need to incorporate not just sys- 
tematic variations in student characteristics 
but also systematic variations in teaching 
behaviors, after the style of Hughes (1973), 
Nuthall and Church (1973), and the Pro- 
gram on Teaching Effectiveness at the 
Stanford Center for Research and Develop- 
ment in Teaching (1976). 

However, the present evidence suggests 
that even those elaborations on the basic 
model would have been inadequate for ex- 
plaining the variations in student achieve- 
ment found in this study. This is because 
those elaborations would not have taken into 
account the possibility of joint contributions 
of student characteristics and process vari- 
ables. The evidence of this study is that 
such joint contributions account for as much 
as 35% of the variance in student achieve- 
ment, 

Some indication of the ways in which ex- 
perimental studies might be designed to in- 
vestigate such joint contributions emerges 
from the correlations among student char- 
acteristics, process variables, and student 
achievement measures reported in Table 2. 
For there it seems that the more able classes 
were especially advantaged in that they were 
more likely to receive instruction containing 
the more effective elements. More able and 
generally more knowledgeable classes tended 
to be exposed to relevant content more often 
and to teacher vagueness less often, and they 
tended to receive a higher proportion of 
positive teacher reactions, all of which cor- 
related significantly with student achieve- 
ment. Perhaps, therefore, experiments are 
required in which context-process correla- 
tions are manipulated by making such 
teaching behaviors contingent upon such 
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student characteristics and in which effects 
of variations in those correlations are in- 
vestigated. 

Just why the correlations discussed above 
should have occurred is beyond the design 
and scope of this study. It has already been 
speculated that teacher expectancy effects 
might have been operating to the advantage 
ofmore able classes. Alternatively, it might 
be that more concerned and more able 
teachers had been appointed to teach more 
able classes, or that more able classes exerted 
influence during the lessons themselves in 
directions that contributed to greater 
learning on their own part. 

It seems clear that carefully designed ex- 
perimental studies of process-product and 
context-process relationships will continue 
to be required for testing cause and effect 
hypotheses growing out of correlational 
studies such as this one. However, the 
model for apportioning variance presented 
in this study could be subjected to further 
testing with data gathered in studies already 
completed or in progress, at little extra cost. 
If it is the case that conventional procedures 
for taking account of variations in student 
characteristics in detecting the contribution 
of process variables to differences in posttest 
achievement have led to overestimates of the 
influence of the former, then future research, 
either of correlational or experimental de- 
sign, needs to be aware of them. 

From the point of view of identifying 
process variables that correlate significantly 
with student achievement, it did not seem to 
matter very much which measure of student 
achievement was used in this study. Process 
variables that correlated significantly with 
one of the student achievement measures 
tended to be correlated significantly with the 
other two. However, process-product cor- 
relations tended to be lower when residual 
achievement using the class as the statistical 
unit was the criterion than when residual 
achievement using the individual student as 
the statistical unit was the criterion. Pro- 
cess-product correlations tended to be 
highest when unadjusted student achieve- 
ment was used as the criterion. 

Because the particular product variable 
selected as the criterion seemed not to affect 
greatly the identification of significant pro- 
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cess correlates of achievement and because 
residual achievement scores are calculated 
on the assumption that both areas x and z of 
the model are attributable to the influence 
of student characteristics, thereby obscuring 
joint student characteristics and process 
contributions, there seem to be good reasons 
for preferring unadjusted student achieve- 
ment measures as the criterion variable in 
this study. It would be interesting to see the 
extent to which other research on teaching 
provides similar evidence in favor of unad- 
justed measures of achievement. 

Student anxiety and dogmatism were 
measured in this study to see if they might 
account for variance in student achievement 
over and above that accounted for by general 
ability and prior knowledge. This they did 
not do, presumably because they were both 
highly correlated with general ability and 
prior knowledge. Evidence of these rela- 
tionships with the other student character- 
istics variables and with unadjusted pupil 
achievement has made their inclusion in the 
study worthwhile. However, there were few 
significant correlations between either 
anxiety or dogmatism and Process variables. 
That content coverage in general, but espe- 

cially by way of teacher structuring, was 
negatively correlated with student anxiety 
and that student dogmatism was Positively 
correlated with teacher vagueness and neg- 
atively correlated with positive teacher re- 
acting suggest that those two student char- 
acteristics might be worth including in sub- 
Sequent classroom research. 
The implementation of Rosenshine and 
Furst’s (1971) Suggestion that relevance of 
classroom activities to criterion tests of 
student achievement be used as a potential 
correlate of achievement was certainly 
worthwhile. This is so not only because 
content coverage was found to be correlated 
Significantly with all three measures of stu- 
dent achievement, but 
many significant relationships it bore to 
other variables of both the process and stu- 
dent characteristics types. This “opportu- 
nity to learn” variable probably emerges as 
the most powerful variable measured in the 
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None of the issues explored in this study 
can be considered to have been resolved. sy 
Evidence gathered from only 29 lessons 
about a single topic is obviously too little for 
that. It is to be hoped, however, that the 
results of the study encourage further ex- 
ploration of those issues, so that over time an 
accumulation of evidence might provide 
sound guidance for researchers and teach- 
ers. 


Reference Note 


1. Gaudry, E. Personal communication, 1971. 
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In recent years, the field of educational psychology has been changing rapidly. The fact 
hat Harcourt Brace Jovanovich is firmly committed to the field is evidenced by the 
[publication of these two excellent textbooks for the introductory course—the course that 
Iprovides the only exposure that many teachers will ever have to this essential body of 
information. Both books offer authoritative scholarship, thorough coverage, and practical 
applications for prospective and in-service teachers. 
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In the Third Edition of Educational Psychology, Dr. 
Cronbach has maintained the distinctive features 
and eclectic approach that characterized the earlier 
editions. At the same time, the text represents a 
thorough revision with expanded treatment of cur- 
rent Gu in educational psychology, the use 
throughout of the most up-to-date research, and the 
sound application of that research to classroom 
situations. Practical problems of teaching and of the 
schools — such as providing social opportunity, 
treating individual differences, planning curricula, 
preparing instructional materials, and classroom 
leadership — are thoroughly examined and many of 
the new case reports deal with everyday problems 
confronted by teachers. In addition, completely 
new features enliven and simplify the text 
presentation — lists of themes and concepts, 
numerous questions for class discussion or written 
assignment, a greatly expanded illustration pro- 
gram, boxed inserts, and chapter-by-chapter anno- 
tated lists of suggested readings. 
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Psychology and Education is a concise yet com- 
prehensive new introduction to educational psy- 
chology, The authors -— an eminent psychologist 
and a professional writer specializing in 
Jeducation — present and interpret up-to-date in- 
ormation in a clear, lively prose style that is readily 
waeccessible to beginning students. Throughout the 

book they emphasize the practical applications of 
educational psychology in the classroom: numerous 
descriptions of actual classroom events, specific 
Mstudent problems, and teachers’ strategies reinforce 
nd apply concepts and ideas to the realities of 
leaching. The book includes authoritative dis- 
ussions of learning, cognitive and affective de- 
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ecent findings on intelligence. intelligence testing, 
low and disabled learners, and social class and 
hnicity. The text is designed in an inviting format, 
ith more than 225 photographs and illustrations 
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Pibliography, 
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Evaluation of Psychological Service 
Delivery Programs is the title of a special 


November, 1977 issue of Professional 
Psychology. Editor Allan Barclay and 
guest Co-editors Bob and Evelyn Perloff 
have assembled a distinguished collec- 
tion of articles that brings readers up to 
dale on the evaluation of psychological 
services. The issue addresses the need 
to justify the effectiveness of service 
delivery programs not only in terms of 
Cost but more importantly in terms of 4 
human benefits, 
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Cummings on psychotherapy; Howard 
Blane on alcoholism; S. B. Sells and 
associates on drug abuse, and NIMH's 
Bertram Brown, Benjamin Liptzin and 
James Stockdill on the federal view of 
mental health program evaluation, 


Ordering Information 


Single copies of the issue are available 
from the Order Department at APA for $5 
each, Allorders $25 or under must be pre- 
paid. Please make checks payable to 
APA. Send your orders to: 


Order Department, P.P, 
s 
N American Psychological 


1200 17th Street, N.W. 
Washington, D.C. 20036 


Theory and Problems of 
Adolescent Developmen 


SECOND EDITION 


By DAVID P. AUSUBEL, M.D., Ph.D., 
RAYMOND MONTEMAYOR, Ph.D., and 
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The Second Edition of this standard text 
been thoroughly revised and expanded to refld 
the developmental differences in American adole 
cence over the past quarter-century. The profe 
sional literature of the past 25 years has beel 
exhaustively reviewed, and all noteworthy ne 
materials have been included. The authors e 
plain adolescence in the United States as a pali 
ticular cultural variant of a pancultural stage” 
personality development that has universal al 
tributes. They describe American adolescents ig 
terms of their actual behavior in the middle 
1970s, clarifying changes in sex role difference: 
attitudes toward parents, peers, school, vocation: 
life, drugs, and sex mores. Chapters on cognitiv 
sexual, and moral development have been ex 
tensively revised and enlarged, and greater a 
tention is paid to juvenile delinquency in middl| 
class groups. The book presents a unified theor 
of a transitional stage of personality developmeti 
that applies to teen-agers in all cultures. It clé 
fies the universal features of adolescence in thë 
context of the current American scene, and cons 
trasts typical American adolescent development 
with that found in many primitive cultures. Be 
havioral changes and disorders are thus placedy 
both developmental and cross-cultural perspecti 
as well as in the perspective of particular A ne 
ican subcultures (e.g, ethnic, racial, and socii 
economic). Encompassed are all aspects of adole 
cence—its biological (genic, endocrinological) a 
cultural determinants as well as the physical 
physiological, personality, moral, cognitive, am 
motor changes and predispositions to behavid 
disorders that accompany it. ^ 
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HENRY W. MORROW, Ed.D. "€ 


The leading authorities in the field of adaptiv 
behavior provide a clear articulation of this frg 
quently misunderstood concept and outline 
techniques by which it is measured. The bo 
presents a new conceptual base for understandin 
adaptive behavior in a functional definition 
Psychological assessment through its intent or pu 
pose. This conceptual base offers a unifying fra 
work through which all the relevant ideas and 
Sues can be comprehended, The book has brod 
implications for the new regulations, laws af 
procedures affecting psychological assessment al 
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Prospective psychology graduate students and 
college counselors will find Graduate Study in 
Psychology for 1979-1980 an indispensable 
resource. This 630-page book published 
by the American Psychological Asso- 
ciation provides specific information 
on more than 500 graduate pro- 
grams in the United States and 
Canada. Each institution lists ap- 
è plication procedures, admission 
requirements, degree require- 
ments, tuition, financial assistance, 
internships, and minority considera- 
tions. General information on applying 
to graduate school is included to help with 
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Possibly your key to greater career 
opportunities. 


Each month 125 to 150 position openings are 
listed by geographical area and employment 
field for psychologists with a minimum of a 
Master's degree. Members may also publish 
availability notices. Coded identification 
numbers assure confidentiality. Ideal for 
employers, students, and Master's and Ph.D 
graduates. 


A one year subscription is as little as 
$9.00 for members, $16.00 for non- 
members. If you're in a hurry, First 
Class delivery is available for an 
additional $4.00. 


For more information, contact: 

The Editor, Employment Bulletin, 

or send your order to: 

American Psychological Association 
Subscription Dept.-Employment Bulletin 
1200 Seventeenth St., N.W. 

Washington, D.C. 20036 


By Ralph F. Blanco and Joseph G. Rosen- 
feld, both of Temple Univ., Philadelphia, 
Pennsylvania. A balanced perspective of the 
varied situations that arise in clinical and 
school psychology is offered to all students 
and practitioners of the psychological disci- 
pline. This text will present a clear under- 
«standing of the stresses and expectations 
placed on the field practitioner. 


The fifteen vividly described case studies en- 
compass behavioral and emotional problems, 
learning disapilities, mental retardation, sen- 
sory impairments, multiple handicaps, and 
giftedness. Symptoms, etiology, treatments 
and follow-ups*are presented for each case 
The emotional states included in these ca 


X-RELATED COGNITIVE DIFFER- 
ENCES: An Essay on Theory and Evidence by 
Julia A. Sherman, Women’s Research Insti- 
tute of Wisconsin, Inc., Madison. Written by a 
leading authority in the field, this provocative 
book reviews and critically evaluates tradi- 
tional theories of sex-related differences in 
ognitive function. Doctor Sherman also ex- 
amines current methodological and concep- 
“tual ‘problems in research. She provides 
up-tó-date appraisals of biological theories of 
sex-related differences in cognition, including 
X-linked theories, the hypothesis of greater 
male variability, metabolic and hormonal the- 
ories, and theories based on purported sex dif- 
ferences in brain lateralization. The latter part 
ol the book considers theory and evidence re- 
lated to. sociocultural determinants. "78, 284 
pp» 7 an cloth-$16.25, paper-$12.25 


BEHAVIOR THERAPY IN CLINICAL 
PRACTICE: Decision Making, Procedure and 
Outcome by Ernest G. Poser, McGill Univ., 
Montreal, Quebec, Canada. Foreword by H. J. 
)Eysenck. A clinical psychologist has devel- 
oped this material to acquaint mental health 
prolessionals with the practical applications 
; of socialconditioning principles. Divided into 

four parts, the text explores specific fears, 

physical expressions of social withdrawal, so- 

4 cially disapproved behavior, and current prob- 
lems in behavior modification. Specific case 
studies are included to illustrate considera- 
tions in selecting proper treatment. 77, 204 
pp. 1 iL, 1 table, $15.75 
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e CASE STUDIES IN CLINICAL AND SCHOOL PSYCHOLOGY e 
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range from homicidal impulses to immobil- 
izing anxiety; intelligence extends from an IQ 
ol 10 to an IQ of 166; settings vary from 
ghettos to wealthy estates, schools, and private 
practice. 


A unique feature of this book is found in the 
workbook sections. These sections are de- 
signed to encourage psychology trainees to 
sharpen their diagnostic skills and to create 
prescriptive interventions in four of the cases. 
The authors’ prescribed interventions are 
listed separately for later comparison. Else- 
where in the text, workbook sections of 
another kind permit the student to score and 
interpret responses to Rorschach, TAT, Draw- 
a-Man, TED, sentence completion and Blacky 
projective tests. '78, 256 pp. 4 il., $11.50 
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MICROCOUNSELING: Innovations in In- 
terviewing, Counseling, Psychotherapy and 
Psychoeducation (2nd Ed.) by Allen E. Ivey, 
Univ. of Massachusetts, Amherst, and Jerry 
Authier, Univ. of Nebraska Medical Center, 
Omaha. Foreword by Bernard G. Guerney, Jr. 
(3 Contributors) The Second Edition of this 
groundbreaking introduction to interviewing 
and counseling updates the extensive research 
on the microcounseling model; presents new 
chapters on the cross-cultural and cross-racial 
implications of interviewing training; pro- 
vides material illustrative of how micro- 
counseling skills are used by various types of 
therapists; and includes a new section on op- 
erational definitions. The text continues to 
present a detailed analysis of the interview, 
and an explication of the videotape methods 
used in microcounseling. '78, 624 pp., 13 iL, 
$27.00 


COUNSELING IN COMMUNICATIVE 
DISORDERS edited by R. E. Hartbauer, An- 
drews Univ., Berrien Springs, Michigan. (13 
Contributors) Students and practitioners of 
clinical psychology and educational psy- 
chology will appreciate this book's coverage 
ol the psychological and emotional problems 
and adjustments of their clients. The text is 
written on a personal level, with considerable 
use of anecdotal material. Each chapter fo- 
cuses on a particular communicative problem, 
giving a general overview of the disorder, 
counseling techniques and procedures, and 
suggestions for evaluation of effectiveness. '78, 
336 pp., 2 iL, 1 table, $18.25 
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TEACHING PEOPLE TO LOVE THEMSELVES: 


EXPLORATIONS IN SELF-CONCEPT DEVELOPMENT 
with DR. DOV PERETZ ELKINS 


Sponsored by National Humanistic Education Center and 
d Growth Associates 
A WORKSHOP FOR: Educators, counselors, students, adminis- 
trators, parents, group leaders, mental health (and other helping) 
professionals; clergy, religious educators, couples, and others. 
THE AIMS OF THIS WORKSHOP ARE: To begin to accept and 
explore the positive aspects of our total selves: mind, body, 
emotions, imagination, inner wisdom and life spirit; to feel good 
about ourselves, improve our interpersonal relationships, and our 
professional competence; to learn how to use techniques with 
others. 
` DR. DOV PERETZ ELKINS, Director of Growth Associates, is an 
international authority on self-concept enhancement. He has 
degrees in literature, education, theology and counseling. 3 
-. doctorate was written on enhancing self-concept. Dov is a certified 
' instructor for Parent and Teacher Effectiveness Training. 
WHERE AND WHEN: Philadelphia — Wednesday & Thursday, 
December 27-28, 1978; Miami, Florida— Friday & Saturday, 
December 29-30, 1978; Atlanta, Georgia — Saturday & Sunday, 
January 6-7, 1979. Workshop hours: 9 a.m. to 5 p.m. daily. 
HOW MUCH: Tuition is $75.00 per person. A deposit of $25.00 is 
required with your registration. You will receive more information 


(including place, accomodation arrangements, times) upon receipt 
of registration. 


Registration Form: Teaching People to Love Themselves, . 


Deng 2s ity — | 
Addtose — c — v State ZipoL-——u Se 

I have enclosed the deposit of $25.00 per person. 

Eug a —— Dates. 


I have included $6.50 for a copy of Glad to Be Me: 
Building Self-Esteem in Yourself and Others by Dov 
Peretz Elkins. 

I have included $17.50 for Teaching People to Love 

Themselves: A Leader's Handbook of Theory & 

Technique for Self-Esteem & Affirmation Training by 

Dov Peretz Elkins. 

I have included $7.50 for the cassette tape by Dr. Elkins, ' 

The Magic of Self-Esteem. 

—— lI am interested in exploring the possibility of an 
in-service workshop for teachers, mental health profes- 
sionals and/or others. Please send details. 

MAIL TO: Growth Associates, P.O. Box 8429, Rochester, N.Y. 


14618, 716/244-1225. Mak Il ch , 
E ake all c m payable to Growth- 


